WO2021057810A1 - Data processing method, data training method, data identifying method and device, and storage medium - Google Patents

Data processing method, data training method, data identifying method and device, and storage medium Download PDF

Info

Publication number
WO2021057810A1
WO2021057810A1 PCT/CN2020/117226 CN2020117226W WO2021057810A1 WO 2021057810 A1 WO2021057810 A1 WO 2021057810A1 CN 2020117226 W CN2020117226 W CN 2020117226W WO 2021057810 A1 WO2021057810 A1 WO 2021057810A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
target
preset
angle
trained
Prior art date
Application number
PCT/CN2020/117226
Other languages
French (fr)
Chinese (zh)
Inventor
沈凌浩
吴新
Original Assignee
深圳数字生命研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳数字生命研究院 filed Critical 深圳数字生命研究院
Publication of WO2021057810A1 publication Critical patent/WO2021057810A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/23Recognition of whole body movements, e.g. for sport training
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Definitions

  • the present invention relates to the application field of computer technology, in particular to a data processing, training, and identification method, device and storage medium.
  • pose estimation technology ie, key point detection technology
  • top-down method Two-step framework
  • bottom-up method Part-based framework
  • the top-down method is to first detect the position of the rectangular frame of all characters in the picture (2D/3D) (the characters are completely contained in the rectangular frame), and then independently detect the bones of the characters in each rectangular frame.
  • Point coordinates, connected to human skeletons, are characterized by high data processing accuracy.
  • the accuracy of posture estimation is highly dependent on the detection quality of the rectangular frame of the person's position.
  • the bottom-up method is to first detect the coordinates of the bone key points of all the characters in the picture, and then deal with the allocation of each bone key point, assign each key point to a different person, and connect the human skeleton. Its characteristic lies in the data.
  • the processing speed is fast, but if there are dense crowds or occlusions between characters, errors are likely to occur in the stage of assigning key points to individuals.
  • the Kinect device is mainly used to obtain the key points of the character in the realization of body recognition, but the device is expensive and not portable.
  • the related technology will cause the error of the data source itself to become larger due to the sampling and calculation model.
  • Related technologies have low accuracy in recognizing human body gestures.
  • the embodiments of the present invention provide a data processing, training, and recognition method, device, and storage medium to at least solve the technical problem of low data processing efficiency in the process of recognizing human posture due to related technologies.
  • a data processing method including: inputting first feature data with a first number of channels into a first type convolutional layer with a second number of filters for calculation, and outputting The second feature data of the second number of channels, where the first number is greater than the second number; the second feature data of the second number of channels is input to the second type of convolutional layer with the second number of filters, and according to the first number
  • the mask parameters that can be learned in the second-type convolutional layer are used to generate the mask of the weight of each filter in the second-type convolutional layer through the neural network; according to the mask, each filter in the second-type convolutional layer and the first 2.
  • connection mode of each channel in the feature data is convolved to calculate the second feature data according to the mapping relationship obtained by the connection mode to obtain the third feature data; the third feature data with the second number of channels is input to the first feature data
  • the third type convolutional layer of the quantity filter is calculated, and the fourth feature data with the first quantity channel is output.
  • the data processing method is applied to deep learning in artificial intelligence.
  • the data processing method is applied to recognize the posture or action of the target in the picture/video.
  • generating a mask of the weights of each filter in the second-type convolutional layer through a neural network includes: according to all the mask parameters in the second-type convolutional layer The connection layer generates a mask of the weight of each filter in the second type of convolutional layer.
  • a data training method including: obtaining a weight classification model to be trained, wherein the weight classification model is a neural network model for obtaining image features of image data; and a weight classification model to be trained Training is performed to obtain a weight classification model; wherein, the method used in training the weight classification model to be trained includes the above-mentioned data processing method.
  • training the weight classification model to be trained to obtain the weight classification model includes: inputting the data in the first preset data set into the weight classification model to be trained to obtain the category prediction result; according to the category prediction result and the first prediction data The label category of the concentrated data is obtained, and the error between the category prediction result and the label category of the data in the first prediction data set is obtained; according to the error, the backpropagation algorithm is used to train the weight classification model to be trained until the weight classification model to be trained converges to obtain Convergent weight classification model.
  • training the weight classification model to be trained with the back propagation algorithm based on the error until the weight classification model to be trained converges includes: through repeated iterations of excitation propagation and weight update, until the weight classification model to be trained converges.
  • the weight classification model to be trained includes a residual structure, a pooling structure, and a fully connected structure, through repeated iterations of incentive propagation and weight update, until the weight classification model to be trained converges, including: In the stage, the image is passed through the convolutional layer of the weight classification model to be trained to obtain features, the category prediction result is obtained in the fully connected layer of the weight classification model to be trained, and then the category prediction result is combined with the label category of the data in the first prediction data set Find the difference to obtain the response error of the hidden layer and the output layer; in the weight update stage, the error is multiplied by the derivative of the response of the current layer to the response of the previous layer to obtain the gradient of the weight matrix between the two layers, along the inverse of the gradient.
  • the direction adjusts the weight matrix with the set learning rate; the gradient matrix is determined as the error of the previous layer, and the weight matrix of the previous layer is calculated, and the weight classification model to be trained is updated through iterative calculation until the weight classification model to be trained converges .
  • a data training method which includes: initializing a feature extraction module in a target detection model through a convergent weight classification model to obtain a target detection model to be trained; wherein the convergent weight
  • the classification model is trained by the above data training method; the target detection model to be trained is trained by the target location frame label information in the second preset data set to obtain the trained target detection model; according to the target key in the third preset data set
  • the point label information trains the network parameters of the single-person pose estimation model to be trained to obtain the trained single-person pose estimation model; according to the trained target detection model and the trained single-person pose estimation model, the weighted attention neural network is obtained model.
  • the target detection model to be trained is trained based on the target location frame label information in the second preset data set, and the target detection model obtained after training includes: the target detection model includes a feature extraction module, a suggestion frame generation module, and a target
  • the feature extraction module and the suggestion box generation module are trained respectively to obtain the first parameter value of the feature extraction module and the first parameter value of the suggestion box generation module; according to the first parameter of the feature extraction module
  • the first parameter value of the value and suggestion box generation module trains the target classifier and the position box regression prediction module to obtain the first parameter value of the target classifier and position box regression prediction module and the second parameter value of the feature extraction module; according to the target classifier and position Box regression prediction module first parameter value and feature extraction module second parameter value training suggestion box generation module to obtain the second parameter value of the suggestion box generation module; training based on the second parameter value of the suggestion box generation module and the second parameter value of the feature extraction module
  • the target classifier and the position box regression prediction module obtain the second parameter value of
  • the feature extraction module is used to extract the features of each data in the second preset data set;
  • the suggestion frame generation module is used to generate candidate target frames of each data according to the features of each data in the second preset data set ;
  • the target classifier and position frame regression prediction module is used to obtain the detection frame of each data target in the second preset data set and the corresponding detection frame according to the characteristics of each data in the second preset data set and the candidate target frame of each data Category; when the suggestion frame generation module includes a convolutional layer with a sliding window, two parallel convolutional layers are connected after the convolutional layer, and the two parallel convolutional layers are the regression layer and the classification layer, the suggestion frame is generated
  • the module is used to generate candidate target frames of each data according to the characteristics of each data in the second preset data set, including: obtaining each data in the second preset data set through the regression layer according to the characteristics of each data in the second preset data set The coordinates of the center anchor point of each candidate target frame and the width and height of the corresponding candidate target
  • the target classifier and the location box regression prediction module is a pooling layer, three fully connected layers and two parallel fully connected layers that are sequentially connected
  • the prediction module is used to obtain the detection frame of each target of each data in the second preset data set and the corresponding detection frame category according to the characteristics of each data in the second preset data set and the candidate target frame of each data, including: through pooling
  • the layer converts the characteristics of each data of different lengths output by the feature extraction module into the characteristics of each data of a fixed length; according to the characteristics of each data of a fixed length, it passes through three fully connected layers and then passes through two parallel fully connected layers. , Output the detection frame of each target of each data in the second preset data set and the category of the corresponding detection frame.
  • training the network parameters of the single-person pose estimation model to be trained based on the target key point label information in the third preset data set, and the single-person pose estimation model obtained after training includes: according to the information in the third preset data set
  • Target key point label information trains the network parameters of the single-person pose estimation model to be trained, and iteratively updates the network parameters of the single-person pose estimation model to be trained through forward propagation and backward propagation algorithms; among them, according to the third preset
  • the target key point label information in the data set is trained on the network parameters of the single-person pose estimation model to be trained, and the network parameters of the single-person pose estimation model to be trained are iteratively updated through the forward propagation and backward propagation algorithms.
  • the network parameters include: according to the preset The aspect ratio expands the height or width of the input single image and crops the single image to a preset size.
  • the method used in training the network parameters of the single-person pose estimation model to be trained includes the above-mentioned data processing method.
  • the method further includes: collecting samples required for training the target detection model to be trained and the single-person pose estimation model to be trained; preprocessing the samples, where the preprocessing includes: data set division and preprocessing Operation; training the weight classification model to be trained to obtain a convergent weight classification model includes: inputting the data in the first preset data set into the weight classification model to be trained to obtain the category prediction result; according to the category prediction result and the first prediction data The label category of the concentrated data is obtained, and the error between the category prediction result and the label category of the data in the first prediction data set is obtained; according to the error, the backpropagation algorithm is used to train the weight classification model to be trained until the weight classification model to be trained converges to obtain Convergent weight classification model.
  • the first preset data set includes: a first type of image data set, the first type of image data set defines a training set and a validation set; the second preset data set includes a second type of image data set And the third type of image data set has a data set labeled with position box information; the second type of image data set has customized training set and verification set; the third type of image data set is randomly divided into training set and verification set according to the preset ratio;
  • the training set of the second type of image data set and the training set of the third type of image data set are the training set of the second preset data set, the validation set of the second type of image data set and the validation set of the third type of image data set are The verification set in the second preset data set;
  • the third preset data set includes the second type image data set and the third type image data set labeled with key point information;
  • the preprocessing operation includes: The data in one preset data set and the third preset data set are processed separately; the data in the second preset data set is processed through random mixing operation
  • the random geometric transformation includes random cropping, random rotation according to a preset angle, and/or random scaling according to a preset zoom ratio;
  • the random mixing operation includes superimposing at least two data according to preset weights, specifically The product of the preset position pixel value in different data and the preset weight is added.
  • the method includes: inputting feature data to be recognized into a weighted attention neural network model, and identifying at least one target in the feature data to be recognized Two-dimensional coordinates of key points, where the weighted attention neural network model is used to estimate the pose of at least one person in a top-down manner, detect the position rectangle of at least one target in the feature data to be recognized, and detect the position rectangle
  • the two-dimensional coordinates of the key points of the inner target through the calculation of the two-dimensional coordinates of the key points of the target, the angle between the line of the first preset key point combination and the line of the second preset key point combination or the first The angle between the line of the preset key point combination and the first preset line; the angle between the line of the first preset key point combination and the line of the second preset key point combination or the first The included angle between the line of the preset key point combination and the first preset line is matched in the first prese
  • the angle between the line of the first preset key point combination and the line of the second preset key point combination or the line of the first preset key point combination and the first preset line Matching the included angle of at least one in the first preset database to obtain the recognition result of the target includes: in the case that the feature data to be recognized includes image data, the obtained angle value of at least one included angle is compared with the first preset database Match the angle values of the corresponding included angle types in to obtain the recognition result of the image data.
  • the angle between the line of the first preset key point combination and the line of the second preset key point combination or the line of the first preset key point combination and the first preset line is matched in the first preset database, and the recognition result of the target is obtained including: in the case that the feature data to be recognized includes video data, for each frame or specified frame, obtain the center of each corresponding frame of the video data
  • An angle-time variation curve of a specific included angle is compared and analyzed with at least one angle-time variation curve of at least one standard motion to obtain an identification result.
  • the angle-time variation curve of at least one specific included angle of the at least one target is obtained, and the angle-time variation curve of at least one specific included angle is obtained by comparing with at least one standard motion.
  • the comparison and analysis of the angle-time variation curve of at least one included angle to obtain the recognition result includes: comparing the angle-time variation curve of at least one specific included angle of the at least one target with at least one angle of at least one included angle obtained in advance for at least one standard motion The time variation curve is compared for similarity.
  • the corresponding target of each corresponding frame in the video data is performing the corresponding standard motion type; the corresponding standard motion type of each corresponding frame in the video data is determined.
  • the target is performing the corresponding standard exercise type, further compare the angle time change curve of at least one specific included angle of the target with the angle time change curve of the corresponding specific included angle of the standard motion; if the target has at least one specific included angle
  • the difference between the adjacent maximum value on the angle-time variation curve of the standard motion and the adjacent maximum value on the angle-time variation curve of the corresponding specific included angle of the standard motion falls within the second preset threshold interval, and then the specific target in the video data is determined
  • the joint motion specification corresponding to the included angle otherwise the joint motion corresponding to the specific included angle of the target in each corresponding frame of the video data is not standardized; the angle time variation curve of at least one specific included angle of the target is judged between adjacent peaks Whether the difference between adjacent peaks on the angle-time variation curve of the
  • the method further includes: performing matching in a second preset database according to the recognition result to obtain a posture evaluation result corresponding to the recognition result.
  • the method further includes: matching in a third preset database according to the posture evaluation result to obtain suggestion information corresponding to the posture evaluation result.
  • a data recognition device including: a coordinate recognition module, configured to input feature data to be recognized into a weighted attention neural network model, and identify at least one of the feature data to be recognized Two-dimensional coordinates of key points of the target, where the weighted attention neural network model is set to estimate the pose of at least one person in a top-down manner, detect the position rectangle of at least one target in the feature data to be recognized, and detect the position
  • the calculation module is set to calculate through the two-dimensional coordinates of the key points of the target to obtain the connection line of the first preset key point combination and the line of the second preset key point combination The included angle between the first preset key point combination or the included angle between the first preset line and the first preset line
  • the matching module is set to combine the first preset key point combination line with the second preset key point The angle between the combined lines or the angle between the line of the first preset key point combination
  • the matching module includes: a first matching unit configured to compare the obtained angle value of at least one included angle with a corresponding included angle in the first preset database when the feature data to be recognized includes image data The angle value of the type is matched, and the recognition result of the image data is obtained.
  • the matching module includes: an acquiring unit configured to acquire, for each frame or specified frame, a key point of at least one target of each corresponding frame in the video data when the feature data to be identified includes video data. Coordinate information, wherein the designated frame is a fixed time interval frame and/or a key frame; the second matching unit is set to obtain at least one of the at least one target according to the key point two-dimensional coordinate information of at least one target of each corresponding frame in the video data The angle-time variation curve of a specific included angle is compared and analyzed with the angle-time variation curve of at least one included angle of at least one standard motion to obtain an identification result.
  • the second matching unit includes: a first judging subunit, configured to clip the angle-time variation curve of at least one specific included angle of the at least one target with at least one pre-obtained at least one standard motion curve. The angle-time variation curve of the angle is compared for similarity.
  • the comparison subunit is set to be in In the case of determining that the corresponding target of each corresponding frame in the video data is performing the corresponding standard motion type, further compare the angle time change curve of at least one specific angle of the target with the angle time change curve of the corresponding specific angle of the standard motion ;
  • the second judging subunit is set to determine the difference between the adjacent maximum value on the angle-time variation curve of at least one specific included angle of the target and the difference between the adjacent maximum value on the angle-time variation curve of the corresponding specific included angle of the standard motion If it falls within the second preset threshold interval, determine the joint motion specification corresponding to the specific included angle of the target of each corresponding frame in the video data, otherwise the joint motion corresponding to the specific included angle of the target in the video data is not standardized; third;
  • the judging subunit is set to judge whether the distance between adjacent peaks on the angle-time variation curve
  • the device further includes: an evaluation module configured to perform matching in a second preset database according to the recognition result to obtain a posture evaluation result corresponding to the recognition result.
  • an evaluation module configured to perform matching in a second preset database according to the recognition result to obtain a posture evaluation result corresponding to the recognition result.
  • the device further includes: a suggestion module configured to, after obtaining the posture evaluation result corresponding to the recognition result, perform matching in a third preset database according to the posture evaluation result to obtain the suggestion information corresponding to the posture evaluation result .
  • a suggestion module configured to, after obtaining the posture evaluation result corresponding to the recognition result, perform matching in a third preset database according to the posture evaluation result to obtain the suggestion information corresponding to the posture evaluation result .
  • a non-volatile storage medium includes a stored program, wherein the device where the non-volatile storage medium is located is controlled to execute the above method when the program is running.
  • a data recognition device including: a non-volatile storage medium and a processor configured to run a program stored in the non-volatile storage medium, and the above method is executed when the program is running .
  • a weighted attention mechanism in which, by introducing a learnable mask mechanism, the grouping convolution mode of the network is not artificially fixed, so that the network itself learns the convolution group and selects the filter useful for the network Perform convolution operation to improve the performance of the network; perform data training on the weight classification model to be trained based on the weight attention mechanism to obtain the weight classification model, and initialize the initial parameters of the feature extraction module in the target detection model through the weight classification model , So that in the process of obtaining the weighted attention neural network model, the weight classification model is used to improve the accuracy of the target detection model and accelerate the convergence speed of the model training;
  • the top-down multi-person pose estimation method is adopted, and the key points of at least one target in the feature data to be recognized are recognized by inputting the feature data to be recognized into the weighted attention neural network model.
  • Two-dimensional coordinates where the weighted attention neural network model is used to estimate the pose of at least one person in a top-down manner, detect the position rectangle of at least one target in the feature data to be recognized, and detect the target in the position rectangle.
  • the two-dimensional coordinates of the key points; the two-dimensional coordinates of the key points of the target are calculated to obtain the angle between the line of the first preset key point combination and the line of the second preset key point combination or the first preset The angle between the line of the key point combination and the first preset line; the angle between the line of the first preset key point combination and the line of the second preset key point combination or the first preset
  • the angle between the line of the key point combination and the first preset line is matched in the first preset database to obtain the target recognition result, which achieves
  • Fig. 1 is a schematic flowchart of a data processing method according to an embodiment of the present invention
  • FIG. 2 is a schematic diagram of a weighted attention mechanism in a data processing method according to an embodiment of the present invention
  • FIG. 3 is a schematic flowchart of a data training method according to an embodiment of the present invention.
  • Fig. 4 is a network structure diagram of a weight classification model in a data training method according to an embodiment of the present invention.
  • Fig. 5 is a schematic flowchart of a data training method according to an embodiment of the present invention.
  • Fig. 6 is a schematic diagram of a target detection model in a data training method according to an embodiment of the present invention.
  • Fig. 7 is a schematic diagram of a single pose estimation model in a data training method according to an embodiment of the present invention.
  • FIG. 8 is a schematic diagram of key point positions and skeleton connections in a data training method according to an embodiment of the present invention.
  • Fig. 9a is a schematic diagram of the effect before labeling the key point positions and the skeleton connection in the data training method according to the embodiment of the present invention.
  • Fig. 9b is a schematic diagram of the effect of labeling key point positions and skeleton connections in the data training method according to an embodiment of the present invention.
  • FIG. 10 is a schematic diagram of the effect of mix-up in the data training method according to an embodiment of the present invention.
  • FIG. 11 is a schematic flowchart of a data identification method according to an embodiment of the present invention.
  • FIG. 12 is a schematic flowchart of a posture risk assessment based on deep learning in a data recognition method according to an embodiment of the present invention.
  • Fig. 13a is a schematic diagram of a front view in a method for assessing posture risk according to an embodiment of the present invention
  • FIG. 13b is a schematic diagram of a side view in a method for assessing a posture risk according to an embodiment of the present invention
  • FIG. 14 is a schematic diagram showing the evaluation result of posture risk in the data recognition method according to an embodiment of the present invention.
  • Fig. 15 is a schematic diagram of a data recognition device according to an embodiment of the present invention.
  • Posture evaluation Use certain technical methods to evaluate the posture of the characters in the picture, such as whether they have O/X legs, whether they have postural diseases such as hunchback or high and low shoulders, and can further conduct various serious posture conditions Grade scoring
  • Action recognition Recognize the action category of the characters in the picture or video through certain technical methods, such as walking, raising hands, applauding and other gesture names or action category names;
  • Key point detection Identify the key point coordinates of a single target or multiple targets in the picture/video through a certain technical method. If the target is a person, the key point coordinates are the bone key point coordinates.
  • FIG. 1 is a schematic flowchart of a data processing method according to an embodiment of the present invention. As shown in FIG. 1, the method includes the following steps:
  • Step S102 Input the first feature data with the first number of channels into the first type convolutional layer with the second number of filters for calculation, and output the second feature data with the second number of channels, where the first number is greater than Second quantity
  • Step S104 input the second feature data with the second number of channels to the second type convolutional layer with the second number of filters, and generate the mask parameters through the neural network according to the learnable mask parameters in the second type convolutional layer The mask of the weight of each filter in the second type of convolutional layer;
  • Step S106 Determine the connection mode between each filter in the second type convolutional layer and each channel in the second feature data according to the mask
  • Step S108 Perform convolution calculation on the second feature data according to the mapping relationship obtained by the connection mode to obtain the third feature data
  • Step S110 Input the third feature data with the second number of channels into the third type convolutional layer with the first number of filters for calculation, and output the fourth feature data with the first number of channels.
  • FIG. 2 is a schematic diagram of a weighted attention mechanism in a data processing method according to an embodiment of the present invention.
  • the first feature data of the first number of channels can be feature map data with 256 channels
  • the first type of convolutional layer of the second data volume filter can be a 1 ⁇ 1 convolution with 128 filters. Therefore, based on FIG. 2, step S102 is to input feature map data with 256 channels into a 1 ⁇ 1 convolutional layer with 128 filters for calculation, and output feature map data with 128 channels;
  • step S104 the feature map data with the number of channels of 128 is input as input to the 3 with 128 filters.
  • ⁇ 3 convolutional layer that is, the second-type convolutional layer with the second number of filters in the embodiment of the present application
  • step S106 it is determined in step S106 that the number of filters and channels in the 3 ⁇ 3 convolutional layer is 128
  • the connection mode of each channel in the feature map data (see the mask diagram on the left in Figure 2), according to the connection mode, in step S108 according to the mapping relationship of the connection mode, the feature map data with the number of channels is 128 Perform the convolution calculation to obtain the third feature data, that is, the feature map data with the number of channels of 128; finally, in step S110, the feature map data with the number of channels of 128 is used as input and input to 1 with 256 filters. Calculate in the ⁇ 1 convolutional layer, and obtain the feature map data with the number of channels of 256.
  • the second type of convolutional layer has a total of 128 channels, and the 3 ⁇ 3 convolution kernel contains 10 weights, that is, the total number of weights is 128 ⁇ 10 (see Figure 2 ) Figure), the corresponding output is also 128 channels, and the corresponding mask matrix size is 128 ⁇ 128. Therefore, we can use a fully connected layer with 10 input channels, 128 output channels, and a sigmoid activation function to generate a weight mask corresponding to 128 output channels for the 10 weights of each convolution kernel.
  • generating a mask of the weights of each filter in the second-type convolutional layer through a neural network includes: according to the second-type convolutional layer The fully connected layer in the build-up layer generates a mask of the weight of each filter in the second-type convolutional layer.
  • a fully connected layer is used to generate a filter mask (mask), as shown in the mask on the left side of Figure 2, in the backward propagation, the network is allowed to learn the mask (mask), and the mask is set to 1.
  • the part is the filter selected by the network. Since the filter connection method is selected by learning a weight mask, the method of filter selection and convolution calculation between the filter and the channel used in steps S102 to S110 in the embodiment of the present application is the weight.
  • the force mechanism weight attention is used to generate a filter mask (mask), as shown in the mask on the left side of Figure 2, in the backward propagation, the network is allowed to learn the mask (mask), and the mask is set to 1.
  • the part is the filter selected by the network. Since the filter connection method is selected by learning a weight mask, the method of filter selection and convolution calculation between the filter and the channel used in steps S102 to S110 in the embodiment of the present application is the weight.
  • the force mechanism weight attention is used to generate a filter mask (mask), as shown in
  • a learnable mask mechanism is introduced, and the grouped convolution mode of the fixed network is not artificially fixed (the grouped convolution mode is the same type of line between in and out shown in Figure 2, which represents the output Convolution is only calculated based on the connected input channels), let the network learn the convolution group by itself, and select filters useful for the network to perform convolution operations to improve the performance of the network.
  • the fully connected layer is used to generate the filter mask, which can also be implemented by the following scheme:
  • step S104 the feature map data with the number of channels of 128 is input as input to the 3 with 128 filters.
  • ⁇ 3 convolutional layer that is, the second-type convolutional layer with the second number of filters in the embodiment of the present application
  • each filter and channel number in the 3 ⁇ 3 convolutional layer is 128 feature map data
  • the connection mode of each channel in (see the mask diagram on the left in Figure 2), according to the connection mode, in step S108 according to the mapping relationship of the connection mode, the convolution calculation is performed on the feature map data with the number of channels 128 , Obtain the third feature data, that is, the feature map data with the number of channels 128; finally, in step S110, the feature map data with the number of channels 128 is used as input, and input to the 1 ⁇ 1 convolution with 256 filters Calculate in the layer, and get the feature map data with the number of channels 256.
  • the mask generates a 128 ⁇ 128 mask matrix based on learnable parameters, derivable transformation and sigmoid activation function, and multiplies the weight of the mask with the filter to make different outputs of the second type of convolutional layer
  • the filter will selectively use different input features; in the prediction, it will be binarized to 0 or 1 according to the preset threshold.
  • group convolution can be performed based on the specific connection mode to optimize the calculation efficiency.
  • the number of parameters used by the fully connected layer to generate a mask is 10 (input) * 128 + 128.
  • any method of generating a 128 ⁇ 128 mask based on trainable parameters can be used.
  • the embodiments of the present application only take the foregoing examples as examples for description, and implementation of the data processing methods provided in the embodiments of the present application shall prevail, and the specifics are not limited.
  • the WeightNet network introduces a learnable mask mechanism, which does not artificially fix the grouping convolution mode of the network (the grouping convolution mode is the same type of line between in and out shown in Figure 2, representing The output convolution is only calculated based on the connected input channel), let the network learn the convolution group by itself, and select the useful filter for the network to perform the convolution operation to improve the performance of the network.
  • the data processing method provided in the embodiment of the present application is applied to deep learning in artificial intelligence.
  • the convolution algorithm based on step S102 to step S104 can apply this weighted attention mechanism to artificial intelligence technology, especially deep neural network learning, so that the network can perform self-learning between filters and channels based on its own learning. Group, and then perform convolution calculation, thereby improving the data processing ability of deep neural network learning.
  • the data processing method provided in the embodiment of the present application is applied to recognize the posture or action of the target in the picture/video.
  • the target may be humans, animals, etc., that is, humans, animals in pictures or videos, in the extension of artificial intelligence (AI) calculation, and generally applicable, based on step S102 to step S102.
  • the S104 convolution algorithm can specifically apply the weighted attention mechanism to recognize the posture of the target in the picture/video, here it can be applied to the security monitoring environment, based on the people, cars, animals, insects, etc. in the acquired pictures/videos Target, predict the behavior and movement trajectory of people, vehicles, animals, insects, etc.;
  • this technology may be preferable to apply this technology to medical diagnosis, for example, by recognizing people in pictures/videos, using the recognized people as targets, and obtaining the key points of the target by obtaining the shape of the target , Perform posture evaluation according to the key point, and further evaluate the bone health of the target according to the posture.
  • the image calculation process used to identify the person in the picture/video can be the data processing method described in step S102-step S110.
  • the applicable convolution algorithm can be shown in Figure 2.
  • the application of the convolution algorithm shown in FIG. 2 to the data model training in AI technology is detailed in the data training method in the second embodiment.
  • a WeightNet network is obtained based on the convolution algorithm shown in FIG. 2
  • FIG. 3 is a schematic flowchart of a data training method according to an embodiment of the present invention. As shown in FIG. 3, the method includes the following steps:
  • Step S302 Obtain a weight classification model to be trained, where the weight classification model is a neural network model for acquiring image features of the image data;
  • FIG. 4 is a network structure diagram of the weight classification model in the data training method according to the embodiment of the present invention.
  • the WeightNet-50 classification model is taken as an example for description. Based on the classification model, the image to be processed can be performed Image feature extraction.
  • the WeightNet-101 classification model is also applicable to the data training method provided in the embodiment of this application.
  • the embodiment of this application only takes the WeightNet-50 classification model as an example for illustration, and realizes the data training method provided in the embodiment of this application as Standard, no specific limitation.
  • step S304 the weight classification model to be trained is trained to obtain the weight classification model; wherein the method used in training the weight classification model to be trained includes the data processing method in Embodiment 1 above.
  • the WeightNet classification model (ie, the weight classification model provided in this embodiment of the application) is finally obtained.
  • training the weight classification model to be trained in step S304 to obtain the weight classification model includes:
  • Step S3041 Input the data in the first preset data set into the weight classification model to be trained to obtain the category prediction result;
  • the preset data set is an image data set covering all object categories, and the object categories include natural categories such as people, dogs, horses, etc.; wherein, in step S3041, the image data set
  • the weight classification model to be trained as the first data that is, pictures of various types, such as people, dogs, horses, etc., are input to the WeightNet-50 classification model to be trained, and the category prediction results of each picture data are obtained.
  • the WeightNet-50 classification model consists of a residual structure, a pooling structure and a fully connected structure:
  • the residual structure is completed by three layers of convolution.
  • the first layer of convolution has n 1x1 convolution kernels with a step size of 1.
  • the second layer has n 3x3 convolution kernels with a step size of 1.
  • the third layer There are 2n 1x1 convolution kernels with a step size of 1;
  • the data in the first preset data set is input into the weight classification model to be trained, and the obtained category prediction result can be the category prediction result of the picture or the category of the image in the video.
  • the preset data set used in the embodiment of the present application only uses a picture-like data set as a preferred example for description, and in addition, it may also include a video image-like data set.
  • the implementation of the data training method provided in the embodiment of the present application shall prevail, which is not specifically limited.
  • Step S3042 Obtain the error between the category prediction result and the label category of the data in the first preset data set according to the category prediction result and the label category of the data in the first preset data set;
  • the image data of the labeled category based on the first preset data set is input into the WeightNet-50 classification model to be trained, and the feature is extracted through forward propagation, and the category prediction result is obtained.
  • the category prediction result is compared with the label category of the data in the first preset data set to obtain an error between the category prediction result and the label category of the data in the first preset data set.
  • Step S3043 Perform a back propagation algorithm to train the weight classification model to be trained according to the error, until the weight classification model to be trained converges, and a converged weight classification model is obtained.
  • the error back propagation algorithm is used to train the model until the model converges, and the WeightNet-50 classification model is obtained.
  • the first preset data set in this embodiment of the application may be the ImageNet data set.
  • the WeightNet classification model is pre-trained by using millions of ImageNet classification data, and the target detection model is initialized by the convergent weight classification model.
  • the feature extraction module improves the accuracy of the final target detection model and speeds up the convergence speed of model training.
  • the ImageNet data set is used because ImageNet contains 1.2 million ImageNet images in 1,000 categories, and the huge amount of data used as samples for training can meet the needs of AI technology for deep neural network learning.
  • the first preset data set provided in the embodiment of the present application only uses the ImageNet data set as an example for description, and the data training method provided in the embodiment of the present application shall prevail, and the specifics are not limited.
  • performing a back-propagation algorithm to train the weight classification model to be trained in step S3044 according to the error until the weight classification model to be trained converges includes:
  • Step S30441 through repeated iterations of excitation propagation and weight update, until the weight classification model to be trained converges; wherein, in the case that the weight classification model to be trained includes residual structure, pooling structure and fully connected structure, through excitation propagation Repeated iterations of weight update and weight update until the weight classification model to be trained converges include: in the incentive propagation stage, the image is obtained through the convolution layer of the weight classification model to be trained to obtain features, and the weight classification model to be trained is obtained in the fully connected layer For the category prediction result, the difference between the category prediction result and the label category of the data in the first prediction data set is obtained to obtain the response error of the hidden layer and the output layer; in the weight update stage, the error and the response of the current layer are compared with the response of the previous layer.
  • ImageNet data set As an example to illustrate, use ImageNet's labeled category data to train network parameters, extract features through forward propagation, and use the category prediction results (one-hot) output by the network and the true label category error , The error back propagation algorithm is used to train the model until the model converges, and the WeightNet-50 classification model is obtained.
  • the error back propagation algorithm is used to train the convolutional neural network model, specifically the repeated iteration of the two links of excitation propagation and weight update, until the convergence condition is reached;
  • the image is obtained through the convolutional layer of the WeightNet-50 classification model to obtain features, and the prediction result is obtained in the last fully connected layer of the network, and then the prediction result and the real result are calculated to obtain the response error of the hidden layer and the output layer. ;
  • the known error is first multiplied by the derivative of the function of the response of the current layer to the response of the previous layer to obtain the gradient of the weight matrix between the two layers, and then follow the opposite direction of this gradient with the set learning rate Adjust the weight matrix; then, use the gradient matrix as the error of the previous layer to calculate the weight matrix of the previous layer, and so on to complete the update of the entire model;
  • Adam can be used as the optimizer for training the WeightNet-50 classification model.
  • the basic learning rate can be set to 0.1 in the setting parameters, and it will be divided by 10 at the 32000 and 48000 iterations, and at 64000 iterations.
  • the training is terminated, the weight decay value is set to 0.0001, and the batch size is set to 128.
  • the training of the WeightNet-50 classification model in the embodiments of this application takes Adam as the optimizer as an example, and only the above is a preferred example in setting parameters, and the data training method provided in the embodiments of this application can be realized as Standard, no specific limitation.
  • FIG. 5 is a schematic flowchart of the data training method according to an embodiment of the present invention, as shown in FIG. 5, including:
  • step S502 the feature extraction module in the target detection model is initialized by the convergent weight classification model to obtain the target detection model to be trained; wherein, the convergent weight classification model is obtained through training and training in the method in Embodiment 2;
  • the data training method provided in the embodiments of the present application is suitable for training a weighted attention neural network model, where the weighted attention neural network model includes a target detection model (Faster-RCNN), and the Faster-RCNN is used for Extract the position frame information of each person in the input image to estimate the pose of a single person pose estimation model.
  • the Faster-RCNN includes: feature extraction module (WeightNet), suggestion frame generation module (RPN), target classifier and position frame regression prediction Module (Fast-RCNN);
  • the feature extraction module in step S502 is the feature extraction module in Faster-RCNN, based on the weight classification model obtained in embodiment 2, and then the feature extraction module is initialized based on the weight classification model, but does not include the output layer parameters.
  • the weights of the feature extraction module that obtains the image features in the first preset data set can be initialized by the weight classification model as follows:
  • the WeightNet-50 classification model is pre-trained for the classification task on the ImageNet dataset, and the final convergent weight is used as the initial weight of the feature extraction module in the person detection model to improve the final Accuracy of the character detection model and speed up the convergence speed of model training;
  • Adam is used as the optimizer (Adam: Adaptive Moment estimation, a method of random optimization); the basic learning rate is set to 0.1, and in the 32000th step of the iteration and Divide by 10 at 48000 steps, and terminate training at the 64000th iteration; the weight decay value is 0.0001; the batch size is set to 128.
  • Adam is used as the optimizer (Adam: Adaptive Moment estimation, a method of random optimization); the basic learning rate is set to 0.1, and in the 32000th step of the iteration and Divide by 10 at 48000 steps, and terminate training at the 64000th iteration; the weight decay value is 0.0001; the batch size is set to 128.
  • the preprocessing operation of the image in the first preset data set adopts a preset probability to randomly flip horizontally.
  • the preset probability can be set to 50%.
  • Step S504 Train the target detection model to be trained by using the target location frame label information in the second preset data set to obtain the trained target detection model;
  • FIG. 6 is a schematic diagram of the target detection model in the data training method according to an embodiment of the present invention.
  • the second preset data set may be a data set containing the target location frame label information, where the second preset data set may be composed of the target location frame label information in the COCO and Kinetics-14 data sets Data set.
  • the target detection model is trained by using the data set composed of the target location frame label information in the COCO and Kinetics-14 data sets to improve the recognition effect of the final overall architecture on the location of characters in similar scenes.
  • the feature extraction module in the embodiment of the present application is obtained based on the weight classification model training in Example 2. The difference lies in the structure and function of the weight classification model and the feature extraction module;
  • the weight classification model is used to pre-train the WeightNet-50 classification model on the ImageNet dataset for classification tasks, and use the final convergent weight as the initial weight of the feature extraction module in the target detection model to improve the final target detection
  • the accuracy of the model and speed up the convergence speed of model training; and the structure of the weight classification model is: weight classification network + classifier;
  • the feature extraction module is obtained by initializing the weights through the weight classification model; structurally, the structure of the feature extraction module is to remove the weight classification model part of the classifier, that is, it contains the weight classification part.
  • Step S506 training the network parameters of the single-person pose estimation model to be trained according to the target key point label information in the third preset data set to obtain the trained single-person pose estimation model;
  • the third preset data set may be a data set containing tag information of target key points, where the third preset data set may be the target key points in the COCO and Kinetics-14 datasets A data set composed of tag information.
  • a single-person pose estimation model is trained by using a data set composed of target key point tag information in the COCO and Kinetics-14 data sets to improve the recognition effect of the final overall architecture on the key points of character bones in similar scenes.
  • FIG. 7 is a schematic diagram of the single-person pose estimation model in the data training method according to an embodiment of the present invention. As shown in FIG. 7, it is based on the HRNet algorithm. And the data set constructed above, retrain a single-person pose model that meets this scenario; the HRNet model connects high-resolution to low-resolution subnets in parallel, which is different from the serial connection in related technologies, and the HRNet model maintains high resolution , Instead of restoring the resolution through a low-to-high process; and the fusion scheme that is different from the related technology, it combines the low-level and high-level representations. In the embodiment of this application, the HRNet model uses repeated multi-scale fusion, using the same Depth and similar level of low-resolution representation to improve high-resolution representation.
  • step S508 a weighted attention neural network model is obtained according to the trained target detection model and the trained single-person pose estimation model.
  • a weighted attention neural network model that is, the combination of the Faster-RCNN model and the HRNet model constitutes a weighted attention Force neural network model.
  • the first preset data set in the data training method provided by the embodiments of the present application is used to train the weight classification model, and then the convergent weight classification model is used to initialize the feature extraction module in the target detection model; the second preset data set Used to train the target detection model; the third preset data set is used to train the single-person pose estimation model.
  • step S504 training the target detection model to be trained based on the target location frame label information in the second preset data set, and obtaining the trained target detection model includes:
  • Step S5041 in the case where the target detection model includes a feature extraction module, a suggestion box generation module, and a target classifier and a position box regression prediction module, train the feature extraction module and the suggestion box generation module respectively to obtain the first parameter of the feature extraction module Value and suggestion box generation module first parameter value;
  • step S502 including: feature extraction module, suggestion box generation module (RPN) and target classifier and location box regression prediction module (Fast-RCNN), the parameters of the feature extraction module and RPN module
  • RPN suggestion box generation module
  • Fast-RCNN target classifier and location box regression prediction module
  • the details of the training are as follows: separately train the feature extraction module and the RPN module parameters to obtain rpn1 (that is, the first parameter value of the suggestion box generation module in the embodiment of this application) and weightnet1 (that is, the first parameter value of the feature extraction module in the embodiment of this application) A parameter value).
  • the suggestion box generation module in the target detection model, and the target classifier and position box regression prediction module can be initialized by different data distribution methods (commonly used initialization methods are: 1. Initialize to 0, 2. Random initialization, 3. Xavier initialization, 4. He initialization; in the embodiment of the present application, 3 or 4 is preferred).
  • Step S5042 Train the target classifier and the position box regression prediction module according to the first parameter value of the feature extraction module and the first parameter value of the suggestion box generation module to obtain the first parameter value of the target classifier and the position box regression prediction module and the first parameter value of the feature extraction module. Two parameter values;
  • Fast-RCNN that is, the target classifier and position box regression prediction module in this embodiment of the application
  • Fast-RCNN is trained according to the first parameter value of the feature extraction module and the first parameter value of the suggestion box generation module to obtain fast-rcnn1 (ie , The first parameter value of the target classifier and the position box regression prediction module in the embodiment of the present application), WeightNet2 (ie, the second parameter value of the feature extraction module in the embodiment of the present application).
  • Step S5043 training the suggestion box generation module according to the first parameter value of the target classifier and the position box regression prediction module and the second parameter value of the feature extraction module to obtain the second parameter value of the suggestion box generation module;
  • the RPN (ie, the suggestion box generation module in the embodiment of the present application) is trained in combination with fast-rpn1 and WeightNet2 to obtain rpn2 (ie, the second parameter value of the suggestion box generation module in the embodiment of the present application).
  • Step S5044 training the target classifier and the position box regression prediction module according to the second parameter value of the suggestion box generation module and the second parameter value of the feature extraction module to obtain the second parameter value of the target classifier and the position box regression prediction module.
  • the Fast-RCNN module is trained according to the second parameter value of the feature extraction module and the second parameter value of the suggestion box generation module to obtain fast-rcnn2 (that is, the second parameter value of the target classifier and the position box regression prediction module in this embodiment of the application). Parameter value).
  • the input image preprocessing operation can use mix-up and random horizontal flip (50%), and the process of training the target detection model can take Adam as an optimizer as an example.
  • the parameters can be set as follows: the basic learning rate is 0.001, the weight attenuation value is 0.0001, the batch size is set to 32, and the steps of each iteration of the 4 training stages are 80,000, 40000, 80000, and 40000 respectively.
  • the feature extraction module is used to extract the features of each data in the second preset data set;
  • the suggestion frame generation module is used to generate candidate target frames of each data according to the features of each data in the second preset data set ;
  • the target classifier and position frame regression prediction module is used to obtain the detection frame of each data target in the second preset data set and the corresponding detection frame according to the characteristics of each data in the second preset data set and the candidate target frame of each data Category; when the suggestion frame generation module includes a convolutional layer with a sliding window, two parallel convolutional layers are connected after the convolutional layer, and the two parallel convolutional layers are the regression layer and the classification layer, the suggestion frame is generated
  • the module is used to generate candidate target frames of each data according to the characteristics of each data in the second preset data set, including: obtaining each data in the second preset data set through the regression layer according to the characteristics of each data in the second preset data set The coordinates of the center anchor point of each candidate target frame and the width and height of the corresponding candidate target
  • the feature extraction module in the target detection model provided by the embodiment of the present application is used to extract a feature map of the input image
  • the proposal frame generation module inputs the feature map extracted by the feature extraction module, and outputs a series of candidate target rectangular frame coordinates, which are used to generate the candidate target frame of the input image.
  • the main input of the target classifier and the location box regression prediction module is the feature map extracted by the feature extraction module and the candidate box generated by the suggestion box generation module for accurate location regression and category prediction results.
  • the RPN network structure includes: a convolution layer using a 3 ⁇ 3 sliding window, followed by two parallel 1 ⁇ 1 convolution layers, which are a regression layer (reg_layer) and a classification layer (cls-layer).
  • the regression layer (reg_layer) is used to predict the center anchor point of the window corresponding to the coordinates x, y and width and height w, h of the candidate box on the original image;
  • cls-layer classification layer: used to determine that the candidate is the foreground Still background.
  • the target classifier and the position box regression prediction module is a pooling layer, three fully connected layers and two parallel fully connected layers connected in sequence
  • the target classifier and The position frame regression prediction module is used to obtain the detection frame of each target of each data in the second preset data set and the corresponding detection frame category according to the characteristics of each data in the second preset data set and the candidate target frame of each data, including: Through the pooling layer, the characteristics of each data of different lengths output by the feature extraction module are converted into the characteristics of each data of a fixed length; according to the characteristics of each data of a fixed length, pass through three fully connected layers and then pass through two parallel
  • the fully connected layer outputs the detection frame of each target of each data in the second preset data set and the category of the corresponding detection frame.
  • the target classifier and position box regression prediction module includes an ROI pooling layer, three fully connected layers and two fully connected layers in parallel.
  • the main function of the ROI pooling layer is to differentiate different
  • the input of the size is converted to the output of a fixed length, and the two parallel fully connected layers are mainly used to predict the category and return to the person detection frame.
  • step S506 training the network parameters of the single-person pose estimation model to be trained according to the target key point label information in the third preset data set, and obtaining the trained single-person pose estimation model includes: according to the third preset
  • the target key point label information in the data set is trained on the network parameters of the single-person pose estimation model to be trained, and the network parameters of the single-person pose estimation model to be trained are iteratively updated through forward propagation and backward propagation algorithms; among them, according to the first
  • the target key point label information in the three preset data sets is trained for the network parameters of the single-person pose estimation model to be trained, and the network parameters of the single-person pose estimation model to be trained are iteratively updated through forward propagation and backward propagation algorithms including: Expand the height or width of the input single-person image according to the preset aspect ratio, and crop the single-person image to a preset size.
  • the input of the HRNet single-person pose estimation network is a single-person image
  • the output is the two-dimensional coordinates of the key points of the human skeleton in the single-person image
  • the structure diagram of the HRNet single-person pose estimation network is shown in Figure 7
  • each stage is divided into a sub-network in parallel, and its resolution is reduced by half compared with the previous network, and the width (the number of channels C) is doubled.
  • each stage contains several exchange blocks (not in the first stage), and each exchange block contains a basic unit on a branch ( It consists of 4 WeightNet residual units, each WeightNet residual unit is shown in Figure 2) and a switching unit that spans the resolution; among them, the function of the switching unit is to pass the current output of each parallel sub-network through up-sampling and down-sampling.
  • the resolution of different branches is merged as the next input of the branch to achieve the effect of multi-scale fusion of the model; specifically, the first stage includes a basic unit and a 3 ⁇ 3 volume
  • the main function of the 3 ⁇ 3 convolutional layer is to reduce the feature map channel output by the basic unit to 32 as the next high-resolution branch; the second, third, and fourth stages include 1, 4, and 3 respectively It can be seen that there are a total of 8 exchange blocks in HRNet, and 8 multi-scale fusions are performed.
  • the number of each branch channel is 32, 64, 128, and 256, respectively.
  • the HRNet network will input the height or width of the single-person image Expand to a fixed aspect ratio (height to width equals 4:3), and then crop the image to a fixed size of 384 ⁇ 288; data enhancement (preprocessing) includes random rotation ( ⁇ 45 degrees), random scaling (0.65 ⁇ 1.35) And/or random level flip; Adam optimizer is used during training, the basic learning rate is set to 0.001, the batch size is set to 16, and it drops to 0.0001 and 0.00001 at the 170th and 200th epochs, respectively. The total training epoch is set to 210.
  • the target detection model and the single-person pose estimation model both use the forward propagation algorithm to obtain the model prediction output and the mean square error of the true label (as in formula (1),
  • y i is the model's prediction for the i-th data
  • y′ i is the true label of the i-th data
  • n is the batch size value
  • the mean square error of the training model on the training data set is minimized/converged (when training the model, when the training accuracy and error do not change with the training iteration steps, it tends to When it is stable, it is said that the model has converged and the error is minimized); and the optimal model is selected through the verification set as the detection model in the test phase (during training, the verification set is used to test the model every certain training interval, and finally Select the model with the highest accuracy or the smallest error on the validation set).
  • the method used in training the network parameters of the single-person pose estimation model to be trained includes the data processing method in Embodiment 1.
  • the data training method provided in this embodiment of the application further includes: collecting samples required for training the target detection model to be trained and the single-person pose estimation model to be trained; preprocessing the samples, where the preprocessing includes: Data set division and preprocessing operations; training the weight classification model to be trained to obtain a convergent weight classification model includes: inputting the data in the first preset data set into the weight classification model to be trained to obtain the category prediction result; according to the category The prediction result and the label category of the data in the first prediction data set are obtained, and the error between the category prediction result and the label category of the data in the first prediction data set is obtained; according to the error, the backpropagation algorithm is used to train the weight classification model to be trained until it is to be trained The weight classification model converges to obtain a convergent weight classification model.
  • samples in the embodiments of this application may be derived from open source data sets, such as: Microsoft COCO 2017 Keypoint Detection Dataset (Microsoft COCO 2017 Keypoint Detection Dataset), Kinetics-600 and ImageNet (Large Scale Visual Recognition Challenge);
  • open source data sets such as: Microsoft COCO 2017 Keypoint Detection Dataset (Microsoft COCO 2017 Keypoint Detection Dataset), Kinetics-600 and ImageNet (Large Scale Visual Recognition Challenge);
  • the preprocessing in the embodiment of the present application includes the division of data sets and preprocessing operations, where the division of data sets is a step of processing data before inputting the data into the model, wherein the above three data sets are based on The data is divided in a preset way, so that the optimal data model can be obtained by screening.
  • the preprocessing operations include mixing operations and random geometric transformations.
  • new training data is obtained by synthesizing different pictures, and the picture is geometrically transformed according to the training data, so that the Among them, it is common for people to be occluded.
  • the preprocessing operation enriches the diversity of training data, makes the model more robust, and can effectively reduce the impact of confronting images.
  • the first preset data set includes: a first type of image data set, the first type of image data set defines a training set and a validation set; the second preset data set includes a second type of image data set And the third type of image data set has a data set labeled with position box information; the second type of image data set has customized training set and verification set; the third type of image data set is randomly divided into training set and verification set according to the preset ratio;
  • the training set of the second type of image data set and the training set of the third type of image data set are the training set of the second preset data set, the validation set of the second type of image data set and the validation set of the third type of image data set are The verification set in the second preset data set;
  • the third preset data set includes the second type image data set and the third type image data set labeled with key point information;
  • the preprocessing operation includes: The data in one preset data set and the third preset data set are processed separately; the data in the second preset data set is processed through random mixing operation
  • the first preset data set includes a first type of image data set.
  • the first type of image data set can be described by taking the ImageNet data set as an example;
  • the second type of image data set included in the data set can be illustrated by taking the data set labeled by the position box information in the Microsoft COCO 2017 Keypoint Detection Dataset (hereinafter referred to as the COCO data set) as an example
  • the third type of image data set included in the second preset data set can be illustrated by taking the data set labeled with position frame information in Kinetics-14 as an example; the second type of image data set included in the third preset data set and the third type of image data set
  • the data set labeled with key point information in the image-like data set can be illustrated by taking the data set labeled with key point information in the COCO data set and the data set labeled with key point information in Kinetics-14 as examples.
  • the COCO data set contains more than 200,000 images and a total of 250,000 data that has been labeled with two-dimensional key point information (in this data set, the scales of the characters in the pictures are mostly medium-scale and large-scale), and the training set and The validation set has a total of more than 150,000 people and 1.7 million labeled key points.
  • the annotation information is mainly recorded in the corresponding .json format file, which records the detailed information of each picture, including: the URL of the picture download, the picture name, the picture resolution, the time when the picture was collected, the index (ID) of the picture, and the picture
  • the number of visible bone key points of each character in the COCO data set (the number of complete annotations in the COCO data set is 17 bone key points, that is, Figure 8 is the connection between the key point position and the skeleton in the data training method according to the embodiment of the present invention
  • the left picture in FIG. 8 is a schematic diagram of the key point positions and skeleton connections of the COCO data set; the right picture in FIG. 8 is the key point obtained based on the COCO data set 2 in the data training method provided by the embodiment of the application Location and skeleton connection diagram
  • FIGs. 9a and 9b are schematic diagrams of the pre- and post-labeling effect of the key point positions and the skeleton connection in the data training method according to the embodiment of the present invention.
  • the labeling process is as follows: Tool, manually mark the specific visible 17 points on each picture, the left side is the original picture, and the right side is the visualized effect picture after labeling.
  • the existing human body detection models and pose estimation models are mainly obtained from image training in natural scenes, they have poor results for target detection and pose estimation in sports scenes; this is because the body poses of people in sports scenes are more different from those in natural scenes.
  • the existing target detection models and pose estimation models for the detection and pose of people in sports scenes. Poor estimated effect;
  • the embodiment of the application collects 14 additional sports categories from the Kinetics-600 open source data set, including: bench press, clean and jerk, rope climbing, deadlift, lunge, boxing, running, sit-ups, rope skipping, deep Squatting and stretching legs, a total of more than 10,000 pictures in sports scenes, and use the open source software Visipedia Annotation Toolkit (an image key point annotation tool) to mark them, the annotation format is the same as the COCO data set, in the embodiment of this application Call it Kinetics-14; based on the target in Kinetics-14 (that is, the third type of image data set in the embodiment of this application) and the COCO data set (that is, the second type of image data set in the embodiment of this application)
  • the position frame label information and the target key point label information are used to train the target detection model and the single-person pose estimation model respectively to improve the final overall framework's recognition of the position of the person in the similar scene and the recognition of the key points of the skeleton.
  • the data set composed of the target location frame label information in the Kinetics-14 and the COCO data set is the second preset data set in the embodiment of the application, and the data set composed of the target key point label information in the Kinetics-14 and the COCO data set This is the third preset data set in the embodiment of this application.
  • the first preset data set needs to perform random geometric transformation of the data during the process of inputting the above data model for data training;
  • the second preset data set needs to perform data mixing operations during the process of inputting the above data model for data training And random geometric transformation;
  • the third preset data set needs to perform random geometric transformation of the data during the process of inputting the above-mentioned data model for data training.
  • the random geometric transformation includes random cropping, random rotation according to a preset angle, and/or random scaling according to a preset zoom ratio;
  • the random mixing operation includes superimposing at least two data according to preset weights, specifically The product of the preset position pixel value in different data and the preset weight is added.
  • FIG. 10 is a schematic diagram of the effect of mix-up in the data recognition method according to an embodiment of the present invention, as shown in FIG. 10, mix-up
  • the operation process is as follows:
  • the two input images are merged into a new image according to a certain weight, and the merged image is used as the new input training data; since the target detection model is very sensitive to image geometric transformation, the two input images are used when performing mix-up operations.
  • the geometric alignment will be used to avoid image distortion, that is, the image is not trimmed and zoomed, and the pixel value of the corresponding position is directly multiplied by a certain weight and then added.
  • the specific expression is formula (2) .
  • the two input images are merged into a new image according to a certain weight, and the merged image is used as the new input training data; since the target detection model is very sensitive to image geometric transformation, the two input images are used when performing mix-up operations.
  • geometric alignment will be used to avoid image distortion, that is, the image is not trimmed and zoomed, and the pixel value of the corresponding position is directly multiplied by a certain weight and then added.
  • the specific expression is formula (3) .
  • x i and x j represent two different images, Represents the image synthesized by the mix-up operation, ⁇ represents the weight of the mix-up, and for each The ⁇ is randomly sampled from a beta random distribution, expressed as formula (4).
  • random geometric transformation includes random cropping (256x256, among which: there can be multiple cropping sizes.
  • it is generally set to the Nth power of 2, and the shortest side is not less than 128, and the largest side Not greater than 512), randomly rotating within the range of (-45°, 45°) (ie, the preset rotation angle in the embodiment of this application), with a 50% probability of random horizontal flipping and randomly within the range of (0.65, 1.35) Zoom.
  • random cropping means that the size of the original picture is randomly cropped to 256x256 (the cropping size used in the embodiment of this application), and the channel size is unchanged;
  • the random rotation operation means that the image angle is randomly rotated within plus or minus 45 degrees to change the image content.
  • random flip operation means to flip the image at a random level with a probability of 50%;
  • random zoom operation means to enlarge or reduce the image within a ratio of 0.65 to 1.35; through random geometric transformation, when training the classification network and the pose estimation network, the random geometry Transformation is not only to increase data, but also a method to reduce data noise and increase model stability.
  • the random geometric transformation in the embodiments of the present application may include one or a combination of at least two of random cropping, random rotation according to a preset angle, and/or random scaling according to a preset zoom ratio, and
  • the execution sequence is adjusted according to the actual needs of the pictures. For example, if the size of some pictures just meets the data training, random cropping or scaling is not required; or, if the display angle of the pictures just meets the data training, random rotation is not required. In the same way, random geometric transformation is performed on the picture according to the actual demand for the picture.
  • the preprocessing operation in the embodiment of this application is to preprocess the part of the data originally used for training in the above-mentioned manner during model training (each round), and then these preprocessed data are used for training. ;
  • the data selected between different rounds is different from the actual training data after preprocessing, in order to achieve the effect of gradual convergence.
  • FIG. 11 is a schematic flowchart of a data identification method according to an embodiment of the present invention, as shown in FIG. 11, including:
  • Step S1102 input the feature data to be recognized into the weighted attention neural network model, and identify the two-dimensional coordinates of the key points of at least one target in the feature data to be recognized.
  • the weighted attention neural network model is used for top-down At least one person’s posture is estimated in a manner of, the position rectangle of at least one target in the feature data to be recognized is detected, and the two-dimensional coordinates of key points of the target in the position rectangle are detected;
  • Step S1104 Calculate the two-dimensional coordinates of the key points of the target to obtain the angle between the line of the first preset key point combination and the line of the second preset key point combination or the first preset key point combination The angle between the connection and the first preset line;
  • the first preset line can be a horizontal line or a vertical line, etc.; there are two key points in the first preset key point combination; and there are two key points in the second preset key point combination.
  • the angle between the line of the first preset key point combination and the line of the second preset key point combination or the line of the first preset key point combination and the first preset is as follows:
  • Scenario 1 The included angle between specific two lines of specific three key points
  • Scenario 2 The angle between the connection of two specific key points and the environmental line (for example, a horizontal line or a vertical line, that is, the first preset line in the embodiment of the present application);
  • the two key points obtained are the two key points located at the shoulder of the human body target.
  • a line segment connection is required. Therefore, when there is no redundant connection, , Through the connection with the horizontal line or the vertical line to form an angle.
  • Scenario 3 The angle between the connection of two specific key points and the connection of the other two key points;
  • Step S1106 the angle between the line of the first preset key point combination and the line of the second preset key point combination or the line between the first preset key point combination and the first preset line
  • the included angle is matched in the first preset database to obtain the recognition result of the target.
  • FIG. 12 is a schematic diagram of the evaluation process of the posture risk based on deep learning in the data recognition method according to the embodiment of the present invention.
  • the characteristic data in the embodiment of the present application It may include: pictures and/or videos, that is, in the embodiment of the present application, the input form of the feature data may include: form one: picture; form two: video; form three: picture and video.
  • the data recognition method provided by the embodiment of the present application also includes data sample collection and neural network learning before inputting the characteristic data into the end-to-end model.
  • the posture risk assessment provided by the embodiment of the present application The method is as follows:
  • Step1 Data collection, according to the acquired data set, sample collection
  • Step2 Based on the sample collection of Step1, preprocess the data in the data set to obtain the training set and the test set respectively;
  • Step3 Input feature data to the end-to-end model to obtain the two-dimensional coordinates of the key points of the target;
  • Step4 According to the data type of the characteristic data, the angle is calculated according to the two-dimensional coordinates of the key points of the target, and the assessment result of the posture risk is generated.
  • the image to be evaluated is input to the end-to-end model, and the output is the two-dimensional coordinates of the key points of the human skeleton recognized by the model (ie, the key point two of the target in the embodiment of the application).
  • the input can also be a sports video, through the above-mentioned acquisition of the continuous change curve information of each joint angle of each athlete in the frequency stream (frame), and compare it with the standard sports library, and then provide targeted sports improvement guidance .
  • the picture and video are processed separately in the multi-person pose estimation module.
  • the key points of the human skeleton in the picture are obtained.
  • Two-dimensional coordinates when the video is input, obtain the continuous change curve information of each joint angle of each athlete in each frame of the video, or extract frame images from the video at a preset time interval, and according to the extracted
  • the frame image acquires the continuous change curve information of each joint angle of each athlete, and extracts frame images at preset time intervals to reduce the pressure on the computer to recognize the image, reduce the amount of calculation, and improve the recognition efficiency; respectively according to the key of human bones
  • Point two-dimensional coordinates and continuous change curve information to obtain the assessment result of the posture risk of each person in the picture and each person in the video.
  • the top-down multi-person pose estimation method is adopted, and the key points of at least one target in the feature data to be recognized are recognized by inputting the feature data to be recognized into the weighted attention neural network model.
  • Dimensional coordinates where the weighted attention neural network model is used to estimate the pose of at least one person in a top-down manner, detect the position rectangle of at least one target in the feature data to be recognized, and detect the position of the target in the position rectangle
  • Two-dimensional coordinates of key points calculated by the two-dimensional coordinates of the key points of the target, the angle between the line of the first preset key point combination and the line of the second preset key point combination or the first preset key is obtained The angle between the line of the point combination and the first preset line; the angle between the line of the first preset key point combination and the line of the second preset key point combination or the first preset key is matched in the first preset database to obtain the recognition result of the target
  • step S1106 the angle between the line of the first preset key point combination and the line of the second preset key point combination or the line of the first preset key point combination and the first preset The angle between the lines is matched in the first preset database, and the recognition result of the target includes:
  • the obtained angle value of at least one included angle is matched with the angle value of the corresponding included angle type in the first preset database to obtain the recognition result of the picture data.
  • the included angle includes: the angle between the line between the eyes and the horizontal line, the angle between the shoulder line and the horizontal line, the angle between the crotch line and the horizontal line, the angle between the center line of the head and the vertical line , The angle between the midline of the torso and the vertical straight line, the joint angle between the upper arm and the lower arm, the joint angle between the thigh and the calf, the angle between the line between the ear and the shoulder and the vertical straight line, the joint angle between the midline of the trunk and the midline of the thigh , The joint angle between the upper arm and the lower arm and the joint angle between the thigh and the calf.
  • FIGS. 13a and 13b are schematic diagrams of front and side shots in the data recognition method according to an embodiment of the present invention. As shown in FIGS. 13a and 13b, they show specific 13 joint angles calculated by the angle calculation module. ; Including the angle between the line between the eyes and the horizontal line (front view/1), the angle between the shoulder line and the horizontal line (front view/2), the angle between the crotch line and the horizontal line (front view/ 3) The angle between the midline of the head and the vertical line (frontal photo/1), the angle between the midline of the torso and the vertical line (frontal photo/5), the joint angle between the upper arm and the lower arm (frontal photo/left 6 right 7), the joint angle between the thigh and the calf (front photo/left 8 right 9), the angle between the line between the ear and the shoulder and the vertical line (side photo/10), the joint angle between the midline of the torso and the midline of the thigh (side Photo/11), the joint angle between the upper arm and the lower arm
  • A, B, and C are three points on a two-dimensional plane (that is, any three points are obtained on the two-dimensional plane where the feature data in the embodiment of the present application is located), and a straight line AB is required.
  • the angle between the line AC and the line AB can be calculated first, and then converted into the corresponding angle.
  • the difference between the angles of the two lines is the angle to be obtained. Considering the direction of the angle, the angle is clockwise. Set as positive.
  • step S1106 the angle between the line of the first preset key point combination and the line of the second preset key point combination or the line of the first preset key point combination and the first preset The angle between the lines is matched in the first preset database, and the recognition result of the target includes:
  • Step S11061 in the case that the feature data to be identified includes video data, for each frame or specified frame, obtain key point two-dimensional coordinate information of at least one target of each corresponding frame in the video data, where the specified frame is a fixed time Interval frames and/or key frames;
  • each frame or specified frame is implemented as follows:
  • the two-dimensional coordinate information of the key points of at least one target in the specified frame of the video data Since there are often repeated images in consecutive frames, in order to improve the efficiency of data processing, the collection of preset time intervals (fixed time intervals) or key frame The two-dimensional coordinate information of the key point of at least one target in the frame picture reduces the pressure of data processing for each frame picture.
  • the key frame can be obtained through the relevant function flags of the software.
  • a frame with a detected person or animal is regarded as a key frame, and/or a frame with a preset amplitude motion change is determined as a key frame ;
  • Obtaining two-dimensional coordinate information of key points of at least one target in the specified frame of the video data can be applied to the uploaded video data that has been shot.
  • acquiring the key point two-dimensional coordinate information of at least one target in a specified frame of video data can be implemented simultaneously on multiple computing devices with data processing capabilities through fixed time interval frames and key frames. Two-dimensional coordinate information of key points of a target.
  • Step S11062 Obtain the angle-time variation curve of at least one specific included angle of at least one target according to the two-dimensional coordinate information of the key point of at least one target in each corresponding frame of the video data, and pass the at least one included angle with at least one standard motion Compare and analyze the angle-time variation curve of the angle, and get the recognition result.
  • the angle-time variation curve of at least one specific included angle of the at least one target is obtained, and the angle-time variation curve of at least one specific included angle is obtained by comparing with at least one standard motion
  • the comparison and analysis of the angle-time variation curve of at least one included angle to obtain the recognition result includes: comparing the angle-time variation curve of at least one specific included angle of the at least one target with at least one angle of at least one included angle obtained in advance for at least one standard motion The time variation curve is compared for similarity.
  • the corresponding target of each corresponding frame in the video data is performing the corresponding standard motion type; the corresponding standard motion type of each corresponding frame in the video data is determined.
  • the target is performing the corresponding standard exercise type, further compare the angle time change curve of at least one specific included angle of the target with the angle time change curve of the corresponding specific included angle of the standard motion; if the target has at least one specific included angle
  • the difference between the adjacent maximum value on the angle-time variation curve of the standard motion and the adjacent maximum value on the angle-time variation curve of the corresponding specific included angle of the standard motion falls within the second preset threshold interval, and then the specific target in the video data is determined
  • the joint motion specification corresponding to the included angle otherwise the joint motion corresponding to the specific included angle of the target in each corresponding frame of the video data is not standardized; the angle time variation curve of at least one specific included angle of the target is judged between adjacent peaks Whether the difference between adjacent peaks on the angle-time variation curve of the
  • the change curve of the arm of a person who is exercising in a video image because when the person lifts or lifts the barbell, the coordinates of the key points of the arm in the image will change. Therefore, according to the value of each angle
  • the connection obtained from the change of time, and then the angle-time change curve is obtained, and the angle-time change curve of at least one included angle of the corresponding standard motion type of at least one standard motion is compared and analyzed according to the angle-time change curve, and the identification is obtained result.
  • the angle time change curve can be the angle time change curve obtained by the angle transformation in each frame of the image in the video; it can also be the angle time change curve obtained by the angle transformation of the frame images extracted in the preset time interval ;
  • the difference between the two angle-time curves is found to be the adjacent maximum value, and the person’s status is judged by the difference. Whether the joint motion corresponding to each specific angle is standardized; and further, by calculating the difference between adjacent peaks on the two angle-time curves, and judging whether it belongs to the third preset threshold interval, the fourth preset threshold interval, or the fifth preset threshold interval.
  • the threshold interval is preset to determine whether the person's exercise intensity is too low, appropriate, or too high.
  • the first preset threshold interval in the embodiment of the present application is used to determine the movement type of the target in the video;
  • the second preset threshold interval is used to determine whether the movement posture of the target in the video is standardized;
  • the third preset threshold interval, the first The fourth preset threshold interval or the fifth preset threshold interval is used to determine the exercise intensity of the target in the video;
  • the setting of the third preset threshold interval, the fourth preset threshold interval or the fifth preset threshold interval can also be realized by setting a threshold interval, and the corresponding exercise intensity is set through the sub-intervals in each threshold interval.
  • this embodiment may not perform action type recognition, and directly obtain the movement type of the feature to be recognized (for example, entering a video or image) (for example, when entering a video or image, the corresponding The type of movement), and then directly compare the at least one angle-time variation curve obtained by the feature recognition to be identified with the corresponding angle-time variation curve of the standard action corresponding to the entered type of movement; the comparison method can be as described above.
  • the main input of the motion guidance module is the motion video of a single person or multiple people.
  • the two-dimensional coordinate information of the key points of each human body in the motion video stream (frame) is obtained through the multi-person pose estimation model.
  • the video stream The two-dimensional coordinates of the (frame) obtain the continuously changing curve value of each specific joint angle of each person in the video stream (frame) through the angle calculation module (each frame of the video (stream) can be regarded as a time point, each time point
  • the connection line of each angle value is the angle change curve of (angle value y/frame x), which is compared and analyzed with the corresponding standard motion curve.
  • the standard motion curve identifies the key points through the model of this application And each joint angle change value, and then obtain the standard movement curve, give the movement correction guidance.
  • the specific implementation is as follows: where each specific angle of each person is recorded with the input of the video stream (frame), a continuous angle change curve is recorded; in the first preset database, each type of standard action has been calculated and stored (Including different stances and orientations of the same action) the angle time change curve of each specific joint angle, when the angle time change curve of each specific joint angle of each person in the video stream (frame) is obtained, the It is matched and compared with the angle-time variation curve of the corresponding standard action; among them, the difference (the lowest value and the highest value) between the adjacent maximum values of the angle-time variation curve can be used to determine whether the motion amplitude of the specific joint to be tested is standardized, if The distance between the adjacent maximum value and the minimum value of the angle-time variation curve of the joint of the test subject and the distance value of the relative position in the standard motion video is greater than the specified threshold (that is, the second prediction in the embodiment of the present application).
  • the distance between each two peaks of the angle change curve (the distance between two adjacent maximum or minimum values) can be used to measure The intensity of a specific angle exercise, if the distance between the adjacent maximum value of the angle time change curve of the joint specified by the tester, and the distance value of the relative position in the standard exercise video, the difference is greater than a specified threshold, and the difference lies in The interval in which the threshold is located (that is, the third preset threshold interval in the embodiment of the present application), it can be concluded that the joint exercise intensity is too high; if it is in the interval of a specified threshold, it can be concluded that the exercise intensity is moderate (that is, , The fourth preset threshold interval in the embodiment of the present application); if it is less than a specified threshold and the difference is in the interval where the threshold is located, it can be concluded that the exercise intensity is too low (ie, the fifth in the embodiment of the present application) Preset threshold interval). Integrate the norm values and
  • the data identification method provided in the embodiment of the present application further includes:
  • step S1109 matching is performed in the second preset database according to the recognition result to obtain a posture evaluation result corresponding to the recognition result.
  • the angle value-posture knowledge base is the second preset database in the embodiment of the application.
  • the posture assessment risk of each part is divided into three levels: low risk, potential risk, and high risk.
  • the specific matching process is:
  • Head roll risk assessment (0-4 degrees: low risk, 4-9 degrees: potential risk, 9 degrees: above high risk)
  • the main matching angle is 1;
  • Pelvic roll risk assessment (0-2 degrees: low risk, 2-4 degrees: potential risk, 4 degrees: above high risk)
  • the main matching angle is 6;
  • Knee hyperextension risk assessment (179-180 degrees: low risk, 177-179 degrees: potential risk, 177 degrees: following high risk)
  • the main matching angle is 13.
  • FIG. 14 is a schematic diagram of the evaluation result of the posture risk in the data recognition method according to the embodiment of the present invention.
  • the embodiment of the present application summarizes 7 common unhealthy postures, namely head tilt , High and low shoulders, ectopic spine, pelvic tilt, abnormal leg shape, head tilt and round shoulders and knee hyperextension.
  • the data recognition method provided in the embodiment of the present application further includes:
  • step S1110 matching is performed in the third preset database according to the posture evaluation result to obtain suggestion information corresponding to the posture evaluation result.
  • the first preset database, the second preset database, and the third preset database may be three independent databases, or databases located on different servers, or three stores on one server.
  • the advice information includes but is not limited to the corresponding posture prompts Potential diseases, suggestions for improvement, etc., for example: when the assessment result includes the risk of forward tilt and round shoulders in the target, the recommended information corresponding to the assessment result can include: posture will cause cervical spine displacement and protrusion; The above changes in posture will cause dizziness, neurological headaches, and head pain; it is recommended to avoid playing with mobile phones for a long time, facing the computer, TV, and reading books for a long time. It is recommended to participate in more physical exercises, especially ball sports;
  • the recommended information corresponding to the evaluation result may include: posture will cause long and short legs, protruding lumbar disc; if the posture changes above, the length of the legs will be different, standing
  • the weight of the receptor affects the weight bearing of the two legs; if the lumbar disc herniation occurs, it will cause uneven force on the lumbar spine and risk of paralysis in bed; recommendations for long and short legs: avoid crossing the two legs, supporting the sitting posture with one leg, and bearing the weight with one leg when standing Recommendations for lumbar disc herniation: Avoid sitting for a long time. It is recommended to take part in more physical exercises, exercise the lumbar spine appropriately, and cooperate with regular massage and massage.
  • the data recognition method provided by the embodiments of the present application can also be applied to online shopping. Taking clothes shopping online as an example, a user uploads a selfie photo or a selfie video, and performs identification through steps S1102 to S1106, and obtains the identification result, based on the identification As a result, it is compared with the model using product A stored in the server, and shopping suggestions are provided based on the comparison result.
  • the sizes of product A are: S size, M size, L size, XL size, XXL size; if steps S1102 to Step S1106 is performed to identify that the user’s reminder is the same size as the model, and the size of the product A worn by the model is M, it is recommended that the user buy product A in M size; if it is smaller than the model, it is recommended that the user buy S size Commodity A; On the contrary, it is recommended that users buy commodity A in L size, XL size or XXL size compared to the fat of the model.
  • FIG. 15 is a schematic diagram of the data recognition device according to an embodiment of the present invention.
  • the weighted attention neural network model is set to perform top-down processing of at least one person Posture estimation, detecting the position rectangle of at least one target in the feature data to be recognized, and detecting the two-dimensional coordinates of the key points of the target in the position rectangle;
  • the calculation module 1504 is set to calculate through the two-dimensional coordinates of the key points of the target to obtain The angle between the line of the first preset key point combination and the line of the second preset key point combination or the angle between the line of the first preset key point combination and the first preset line;
  • the module 1506 is configured to set the angle between the line of the first preset key point combination and the line of the second preset key point combination or the angle between the line of the first preset key point combination and the first preset line The angle between the two is matched in the first preset database to obtain the recognition result of the target.
  • the matching module 1506 includes: a first matching unit configured to compare the obtained angle value of at least one included angle with a corresponding value in the first preset database when the feature data to be recognized includes image data. The angle value of the angle type is matched to obtain the recognition result of the image data.
  • the matching module 1506 includes: an acquiring unit configured to acquire key points of at least one target of each corresponding frame in the video data for each frame or specified frame in the case that the feature data to be identified includes video data.
  • Dimensional coordinate information wherein the designated frame is a fixed time interval frame and/or a key frame; the second matching unit is set to obtain at least one target's two-dimensional coordinate information according to the key point two-dimensional coordinate information of at least one target of each corresponding frame in the video data
  • An angle-time variation curve of a specific included angle is compared and analyzed with at least one angle-time variation curve of at least one standard motion to obtain an identification result.
  • the second matching unit includes: a first judging subunit, configured to clip the angle-time variation curve of at least one specific included angle of the at least one target with at least one pre-obtained at least one standard motion curve. The angle-time variation curve of the angle is compared for similarity.
  • the comparison subunit is set to be in In the case of determining that the corresponding target of each corresponding frame in the video data is performing the corresponding standard motion type, further compare the angle time change curve of at least one specific angle of the target with the angle time change curve of the corresponding specific angle of the standard motion ;
  • the second judging subunit is set to determine the difference between the adjacent maximum value on the angle-time variation curve of at least one specific included angle of the target and the difference between the adjacent maximum value on the angle-time variation curve of the corresponding specific included angle of the standard motion If it falls within the second preset threshold interval, determine the joint motion specification corresponding to the specific included angle of the target of each corresponding frame in the video data, otherwise the joint motion corresponding to the specific included angle of the target in the video data is not standardized; third;
  • the judging subunit is set to judge whether the distance between adjacent peaks on the angle-time variation curve
  • the data recognition device provided in the embodiment of the present application further includes: an evaluation module configured to perform matching in a second preset database according to the recognition result to obtain a posture evaluation result corresponding to the recognition result.
  • the data recognition device provided in the embodiment of the present application further includes: a suggestion module configured to perform a matching in a third preset database according to the posture evaluation result after obtaining the posture evaluation result corresponding to the recognition result to obtain Suggested information corresponding to the posture assessment results.
  • a suggestion module configured to perform a matching in a third preset database according to the posture evaluation result after obtaining the posture evaluation result corresponding to the recognition result to obtain Suggested information corresponding to the posture assessment results.
  • a non-volatile storage medium includes a stored program, wherein the device where the non-volatile storage medium is located is controlled to execute the above method when the program is running.
  • a data recognition device including: a non-volatile storage medium and a processor configured to run a program stored in the non-volatile storage medium, and the above method is executed when the program is running .
  • the disclosed technical content can be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units may be a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or may be Integrate into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, units or modules, and may be in electrical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the various embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • the technical solution of the present invention essentially or the part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium.
  • a computer device which can be a personal computer, a server, or a network device, etc.
  • the aforementioned storage media include: U disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or optical disk and other media that can store program codes. .
  • the solutions provided in the embodiments of the present application can be applied to the image recognition process, for example, can be applied to the image recognition process of the human body posture.
  • the feature data to be recognized is input into the weighted attention neural network model to identify at least one target in the feature data to be recognized.
  • the two-dimensional coordinates of the key points therefore, realize the evaluation results are provided based on the human body posture after the accuracy and efficiency are improved, and the purpose of improving the recognition accuracy and efficiency of the human posture is achieved, thereby solving the problem of the recognition process of the human posture due to related technologies In, the technical problem of low data processing efficiency.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Mathematical Physics (AREA)
  • Library & Information Science (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Psychiatry (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Social Psychology (AREA)
  • Databases & Information Systems (AREA)
  • Image Analysis (AREA)

Abstract

A data processing method, a data training method, a data identifying method and device, and a storage medium. The data processing method comprises: inputting first feature data having a first number of channels into a first type of convolution layer having a second number of filters, and outputting second feature data having a second number of channels; inputting the second feature data having the second number of channels into a second type of convolution layer having the second number of filters, and generating, according to a learnable mask parameter in the second type of convolution layer, a mask of the weight of each filter in the second type of convolution layer by means of a neural network; determining a connection mode between each filter in the second type of convolution layer and each channel in the second feature data according to the mask; calculating the second feature data according to a mapping relationship obtained by the connection mode to obtain third feature data; and inputting the third feature data having the second number of channels into a third type of convolution layer having the first number of filters, and outputting fourth feature data having the first number of channels.

Description

数据处理、训练、识别方法、装置和存储介质Data processing, training, identification method, device and storage medium 技术领域Technical field
本发明涉及计算机技术应用领域,具体而言,涉及一种数据处理、训练、识别方法、装置和存储介质。The present invention relates to the application field of computer technology, in particular to a data processing, training, and identification method, device and storage medium.
背景技术Background technique
在姿态估计技术中(即,关键点检测技术)目前常用的两种解决方案包括:自顶向下的方法(Two-step framework)和自底向上的方法(Part-based framework);In pose estimation technology (ie, key point detection technology) currently two commonly used solutions include: top-down method (Two-step framework) and bottom-up method (Part-based framework);
其中,自顶向下的方法是先检测图片(2D/3D)中所有人物的位置矩形框(人物完整的被包含在矩形框内),然后分别独立地检测每一个矩形框内人物的骨骼关键点坐标,连接成人物骨架,其特点在于数据处理精度高,其中,姿态估计的准确度高度依赖于人物位置矩形框的检测质量。Among them, the top-down method is to first detect the position of the rectangular frame of all characters in the picture (2D/3D) (the characters are completely contained in the rectangular frame), and then independently detect the bones of the characters in each rectangular frame. Point coordinates, connected to human skeletons, are characterized by high data processing accuracy. Among them, the accuracy of posture estimation is highly dependent on the detection quality of the rectangular frame of the person's position.
自底向上的方法是先检测出图片中所有人物的骨骼关键点坐标,然后处理每个骨骼关键点的分配问题,将每个关键点分配给不同的人,连接成人物骨架,其特点在于数据处理速度快,但是如果出现密集人群或人物之间出现遮挡,那么在分配关键点到个人的阶段容易出现错误的情况。The bottom-up method is to first detect the coordinates of the bone key points of all the characters in the picture, and then deal with the allocation of each bone key point, assign each key point to a different person, and connect the human skeleton. Its characteristic lies in the data. The processing speed is fast, but if there are dense crowds or occlusions between characters, errors are likely to occur in the stage of assigning key points to individuals.
而相关技术中在实现体态识别上主要通过Kinect设备获取人物关键点,但是该设备价格昂贵且不便携带,此外,相关技术中由于采样和计算模型的原因会导致数据源头本身的误差变大,因此相关技术在对人体姿态的动作识别上精度低。In the related technology, the Kinect device is mainly used to obtain the key points of the character in the realization of body recognition, but the device is expensive and not portable. In addition, the related technology will cause the error of the data source itself to become larger due to the sampling and calculation model. Related technologies have low accuracy in recognizing human body gestures.
针对上述由于相关技术在对人体姿态的识别过程中,数据处理效率低的问题,目前尚未提出有效的解决方案。In view of the above-mentioned problem of low data processing efficiency in the process of recognizing human posture due to related technologies, no effective solution has been proposed at present.
发明内容Summary of the invention
本发明实施例提供了一种数据处理、训练、识别方法、装置和存储介质,以至少解决由于相关技术在对人体姿态的识别过程中,数据处理效率低的技术问题。The embodiments of the present invention provide a data processing, training, and recognition method, device, and storage medium to at least solve the technical problem of low data processing efficiency in the process of recognizing human posture due to related technologies.
根据本发明实施例的一个方面,提供了一种数据处理方法,包括:将具备第一数量通道的第一特征数据输入至具备第二数量滤波器的第一类卷积层进行计算,输出具备第二数量通道的第二特征数据,其中,第一数量大于第二数量;将具备第二数量通 道的第二特征数据输入至具备第二数量滤波器的第二类卷积层,并根据第二类卷积层中可学习的掩码参数,通过神经网络生成第二类卷积层中各个滤波器的权重的掩码;依据掩码确定第二类卷积层中的各个滤波器与第二特征数据中的各通道的连接方式;依据连接方式得到的映射关系对第二特征数据进行卷积计算,得到第三特征数据;将具备第二数量通道的第三特征数据输入至具备第一数量滤波器的第三类卷积层进行计算,输出具备第一数量通道的第四特征数据。According to one aspect of the embodiments of the present invention, a data processing method is provided, including: inputting first feature data with a first number of channels into a first type convolutional layer with a second number of filters for calculation, and outputting The second feature data of the second number of channels, where the first number is greater than the second number; the second feature data of the second number of channels is input to the second type of convolutional layer with the second number of filters, and according to the first number The mask parameters that can be learned in the second-type convolutional layer are used to generate the mask of the weight of each filter in the second-type convolutional layer through the neural network; according to the mask, each filter in the second-type convolutional layer and the first 2. The connection mode of each channel in the feature data; the second feature data is convolved to calculate the second feature data according to the mapping relationship obtained by the connection mode to obtain the third feature data; the third feature data with the second number of channels is input to the first feature data The third type convolutional layer of the quantity filter is calculated, and the fourth feature data with the first quantity channel is output.
可选的,数据处理方法应用于人工智能中的深度学习。Optionally, the data processing method is applied to deep learning in artificial intelligence.
可选的,数据处理方法应用于识别图片/视频中的目标的姿态或动作。Optionally, the data processing method is applied to recognize the posture or action of the target in the picture/video.
可选的,根据第二类卷积层中可学习的掩码参数,通过神经网络生成第二类卷积层中各个滤波器的权重的掩码包括:根据第二类卷积层中的全连接层生成第二类卷积层中各个滤波器的权重的掩码。Optionally, according to the learnable mask parameters in the second-type convolutional layer, generating a mask of the weights of each filter in the second-type convolutional layer through a neural network includes: according to all the mask parameters in the second-type convolutional layer The connection layer generates a mask of the weight of each filter in the second type of convolutional layer.
根据本发明实施例的一个方面,提供了一种数据训练方法,包括:获取待训练的权重分类模型,其中,权重分类模型为获取图像数据的图像特征的神经网络模型;对待训练的权重分类模型进行训练,得到权重分类模型;其中,对待训练的权重分类模型进行训练中使用的方法包括上述数据处理方法。According to one aspect of the embodiments of the present invention, a data training method is provided, including: obtaining a weight classification model to be trained, wherein the weight classification model is a neural network model for obtaining image features of image data; and a weight classification model to be trained Training is performed to obtain a weight classification model; wherein, the method used in training the weight classification model to be trained includes the above-mentioned data processing method.
可选的,对待训练的权重分类模型进行训练,得到权重分类模型包括:将第一预设数据集中的数据输入待训练的权重分类模型,得到类别预测结果;依据类别预测结果与第一预测数据集中的数据的标签类别,得到类别预测结果与第一预测数据集中的数据的标签类别的误差;依据误差进行反向传播算法训练待训练的权重分类模型,直至待训练的权重分类模型收敛,得到收敛的权重分类模型。Optionally, training the weight classification model to be trained to obtain the weight classification model includes: inputting the data in the first preset data set into the weight classification model to be trained to obtain the category prediction result; according to the category prediction result and the first prediction data The label category of the concentrated data is obtained, and the error between the category prediction result and the label category of the data in the first prediction data set is obtained; according to the error, the backpropagation algorithm is used to train the weight classification model to be trained until the weight classification model to be trained converges to obtain Convergent weight classification model.
可选的,依据误差进行反向传播算法训练待训练的权重分类模型,直至待训练的权重分类模型收敛包括:通过激励传播和权重更新的反复迭代,直至待训练的权重分类模型收敛。Optionally, training the weight classification model to be trained with the back propagation algorithm based on the error until the weight classification model to be trained converges includes: through repeated iterations of excitation propagation and weight update, until the weight classification model to be trained converges.
可选的,在待训练的权重分类模型包括残差结构,池化结构和全连接结构的情况下,通过激励传播和权重更新的反复迭代,直至待训练的权重分类模型收敛包括:在激励传播阶段,将图像通过待训练的权重分类模型的卷积层获取特征,在待训练的权重分类模型的全连接层获取类别预测结果,再将类别预测结果与第一预测数据集中的数据的标签类别求差,得到隐藏层和输出层的响应误差;在权重更新阶段,将误差与本层响应对前一层响应的函数的导数相乘,获得两层之间权重矩阵的梯度,沿梯度的反方向以设定的学习率调整权重矩阵;将梯度矩阵确定为前一层的误差,并计算前一层的权重矩阵,通过迭代计算对待训练的权重分类模型更新,直至待训练的权重分类模型收敛。Optionally, when the weight classification model to be trained includes a residual structure, a pooling structure, and a fully connected structure, through repeated iterations of incentive propagation and weight update, until the weight classification model to be trained converges, including: In the stage, the image is passed through the convolutional layer of the weight classification model to be trained to obtain features, the category prediction result is obtained in the fully connected layer of the weight classification model to be trained, and then the category prediction result is combined with the label category of the data in the first prediction data set Find the difference to obtain the response error of the hidden layer and the output layer; in the weight update stage, the error is multiplied by the derivative of the response of the current layer to the response of the previous layer to obtain the gradient of the weight matrix between the two layers, along the inverse of the gradient. The direction adjusts the weight matrix with the set learning rate; the gradient matrix is determined as the error of the previous layer, and the weight matrix of the previous layer is calculated, and the weight classification model to be trained is updated through iterative calculation until the weight classification model to be trained converges .
根据本发明实施例的另一个方面,提供了一种数据训练方法,包括:通过收敛的权重分类模型初始化目标检测模型中的特征提取模块,获得待训练的目标检测模型;其中,该收敛的权重分类模型通过上述数据训练方法训练得到;通过第二预设数据集中的目标位置框标签信息对待训练的目标检测模型进行训练,得到训练后的目标检测模型;依据第三预设数据集中的目标关键点标签信息对待训练的单人姿态估计模型的网络参数进行训练,得到训练后的单人姿态估计模型;依据训练后的目标检测模型和训练后的单人姿态估计模型,得到权重注意力神经网络模型。According to another aspect of the embodiments of the present invention, a data training method is provided, which includes: initializing a feature extraction module in a target detection model through a convergent weight classification model to obtain a target detection model to be trained; wherein the convergent weight The classification model is trained by the above data training method; the target detection model to be trained is trained by the target location frame label information in the second preset data set to obtain the trained target detection model; according to the target key in the third preset data set The point label information trains the network parameters of the single-person pose estimation model to be trained to obtain the trained single-person pose estimation model; according to the trained target detection model and the trained single-person pose estimation model, the weighted attention neural network is obtained model.
可选的,通过第二预设数据集中的目标位置框标签信息对待训练的目标检测模型进行训练,得到训练后的目标检测模型包括:在目标检测模型包括特征提取模块、建议框生成模块和目标分类器与位置框回归预测模块的情况下,分别对特征提取模块和建议框生成模块进行训练,得到特征提取模块第一参数值和建议框生成模块第一参数值;依据特征提取模块第一参数值和建议框生成模块第一参数值训练目标分类器与位置框回归预测模块,得到目标分类器与位置框回归预测模块第一参数值和特征提取模块第二参数值;依据目标分类器与位置框回归预测模块第一参数值和特征提取模块第二参数值训练建议框生成模块,得到建议框生成模块第二参数值;依据建议框生成模块第二参数值和特征提取模块第二参数值训练目标分类器与位置框回归预测模块,得到目标分类器与位置框回归预测模块第二参数值。Optionally, the target detection model to be trained is trained based on the target location frame label information in the second preset data set, and the target detection model obtained after training includes: the target detection model includes a feature extraction module, a suggestion frame generation module, and a target In the case of the classifier and the position box regression prediction module, the feature extraction module and the suggestion box generation module are trained respectively to obtain the first parameter value of the feature extraction module and the first parameter value of the suggestion box generation module; according to the first parameter of the feature extraction module The first parameter value of the value and suggestion box generation module trains the target classifier and the position box regression prediction module to obtain the first parameter value of the target classifier and position box regression prediction module and the second parameter value of the feature extraction module; according to the target classifier and position Box regression prediction module first parameter value and feature extraction module second parameter value training suggestion box generation module to obtain the second parameter value of the suggestion box generation module; training based on the second parameter value of the suggestion box generation module and the second parameter value of the feature extraction module The target classifier and the position box regression prediction module obtain the second parameter value of the target classifier and the position box regression prediction module.
进一步地,可选的,特征提取模块用于提取第二预设数据集中的各个数据的特征;建议框生成模块用于依据第二预设数据集中的各个数据的特征生成各个数据的候选目标框;目标分类器与位置框回归预测模块用于依据第二预设数据集中的各个数据的特征和各个数据的候选目标框获取第二预设数据集中各个数据的目标的检测框及相应检测框的类别;在建议框生成模块包括一个滑窗的卷积层,卷积层后连接两个并行的卷积层,两个并行的卷积层分别为回归层和分类层的情况下,建议框生成模块用于依据第二预设数据集中的各个数据的特征生成各个数据的候选目标框包括:依据第二预设数据集中的各个数据的特征通过回归层,得到第二预设数据集中的各个数据的各个候选目标框的中心锚点的坐标和相应的候选目标框的宽与高;通过分类层判定各个数据的各个候选目标框是前景或背景。Further, optionally, the feature extraction module is used to extract the features of each data in the second preset data set; the suggestion frame generation module is used to generate candidate target frames of each data according to the features of each data in the second preset data set ; The target classifier and position frame regression prediction module is used to obtain the detection frame of each data target in the second preset data set and the corresponding detection frame according to the characteristics of each data in the second preset data set and the candidate target frame of each data Category; when the suggestion frame generation module includes a convolutional layer with a sliding window, two parallel convolutional layers are connected after the convolutional layer, and the two parallel convolutional layers are the regression layer and the classification layer, the suggestion frame is generated The module is used to generate candidate target frames of each data according to the characteristics of each data in the second preset data set, including: obtaining each data in the second preset data set through the regression layer according to the characteristics of each data in the second preset data set The coordinates of the center anchor point of each candidate target frame and the width and height of the corresponding candidate target frame; the classification layer determines whether each candidate target frame of each data is foreground or background.
可选的,在目标分类器与位置框回归预测模块的结构为顺次连接的一个池化层、三个全连接层和并行的两个全连接层的情况下,目标分类器与位置框回归预测模块用于依据第二预设数据集中的各个数据的特征和各个数据的候选目标框获取第二预设数据集中各个数据的各个目标的检测框和相应的检测框的类别包括:通过池化层将特征提取模块输出的不同长度的各个数据的特征转换为固定长度的各个数据的特征;依据固定长度的各个数据的特征,分别通过三个全连接层后再通过并行的两个全连接层, 输出第二预设数据集中各个数据的各个目标的检测框及相应检测框的类别。Optionally, when the structure of the target classifier and the location box regression prediction module is a pooling layer, three fully connected layers and two parallel fully connected layers that are sequentially connected, the target classifier and the location box regression The prediction module is used to obtain the detection frame of each target of each data in the second preset data set and the corresponding detection frame category according to the characteristics of each data in the second preset data set and the candidate target frame of each data, including: through pooling The layer converts the characteristics of each data of different lengths output by the feature extraction module into the characteristics of each data of a fixed length; according to the characteristics of each data of a fixed length, it passes through three fully connected layers and then passes through two parallel fully connected layers. , Output the detection frame of each target of each data in the second preset data set and the category of the corresponding detection frame.
可选的,依据第三预设数据集中的目标关键点标签信息对待训练的单人姿态估计模型的网络参数进行训练,得到训练后的单人姿态估计模型包括:依据第三预设数据集中的目标关键点标签信息对待训练的单人姿态估计模型的网络参数进行训练,通过前向传播和后向传播算法迭代的更新待训练的单人姿态估计模型的网络参数;其中,依据第三预设数据集中的目标关键点标签信息对待训练的单人姿态估计模型的网络参数进行训练,通过前向传播和后向传播算法迭代的更新待训练的单人姿态估计模型的网络参数包括:依据预设宽高比对输入的单人图像的高度或宽度进行扩展,并将单人图像裁剪为预设尺寸。Optionally, training the network parameters of the single-person pose estimation model to be trained based on the target key point label information in the third preset data set, and the single-person pose estimation model obtained after training includes: according to the information in the third preset data set Target key point label information trains the network parameters of the single-person pose estimation model to be trained, and iteratively updates the network parameters of the single-person pose estimation model to be trained through forward propagation and backward propagation algorithms; among them, according to the third preset The target key point label information in the data set is trained on the network parameters of the single-person pose estimation model to be trained, and the network parameters of the single-person pose estimation model to be trained are iteratively updated through the forward propagation and backward propagation algorithms. The network parameters include: according to the preset The aspect ratio expands the height or width of the input single image and crops the single image to a preset size.
可选的,对待训练的单人姿态估计模型的网络参数进行训练中使用的方法包括上述的数据处理方法。Optionally, the method used in training the network parameters of the single-person pose estimation model to be trained includes the above-mentioned data processing method.
可选的,该方法还包括:收集训练待训练的目标检测模型和待训练的单人姿态估计模型所需的样本;对样本进行预处理,其中,预处理包括:数据集的划分和预处理操作;对待训练的权重分类模型进行训练,得到收敛的权重分类模型包括:将第一预设数据集中的数据输入待训练的权重分类模型,得到类别预测结果;依据类别预测结果与第一预测数据集中的数据的标签类别,得到类别预测结果与第一预测数据集中的数据的标签类别的误差;依据误差进行反向传播算法训练待训练的权重分类模型,直至待训练的权重分类模型收敛,得到收敛的权重分类模型。Optionally, the method further includes: collecting samples required for training the target detection model to be trained and the single-person pose estimation model to be trained; preprocessing the samples, where the preprocessing includes: data set division and preprocessing Operation; training the weight classification model to be trained to obtain a convergent weight classification model includes: inputting the data in the first preset data set into the weight classification model to be trained to obtain the category prediction result; according to the category prediction result and the first prediction data The label category of the concentrated data is obtained, and the error between the category prediction result and the label category of the data in the first prediction data set is obtained; according to the error, the backpropagation algorithm is used to train the weight classification model to be trained until the weight classification model to be trained converges to obtain Convergent weight classification model.
进一步地,可选的,第一预设数据集包括:第一类图像数据集,第一类图像数据集自定义了训练集和验证集;第二预设数据集包括第二类图像数据集和第三类图像数据集中有位置框信息标注的数据集合;第二类图像数据集自定义了训练集和验证集;第三类图像数据集按照预设比例随机划分为训练集和验证集;第二类图像数据集的训练集和第三类图像数据集的训练集为第二预设数据集中的训练集,第二类图像数据集的验证集和第三类图像数据集的验证集为第二预设数据集中的验证集;第三预设数据集包括第二类图像数据集和第三类图像数据集中有关键点信息标注的数据集合;预处理操作包括:通过随机几何变换对第一预设数据集和第三预设数据集中的数据分别进行处理;通过随机混合操作和/或随机几何变换对第二预设数据集中的数据进行处理。Further, optionally, the first preset data set includes: a first type of image data set, the first type of image data set defines a training set and a validation set; the second preset data set includes a second type of image data set And the third type of image data set has a data set labeled with position box information; the second type of image data set has customized training set and verification set; the third type of image data set is randomly divided into training set and verification set according to the preset ratio; The training set of the second type of image data set and the training set of the third type of image data set are the training set of the second preset data set, the validation set of the second type of image data set and the validation set of the third type of image data set are The verification set in the second preset data set; the third preset data set includes the second type image data set and the third type image data set labeled with key point information; the preprocessing operation includes: The data in one preset data set and the third preset data set are processed separately; the data in the second preset data set is processed through random mixing operation and/or random geometric transformation.
可选的,通过随机几何变换包括随机裁剪、按预设角度进行随机旋转和/或按照预设缩放比例进行随机缩放;随机混合操作包括将至少两个数据按照预设权重进行重合,具体为将不同数据中的预设位置像素值与预设权重的乘积相加。Optionally, the random geometric transformation includes random cropping, random rotation according to a preset angle, and/or random scaling according to a preset zoom ratio; the random mixing operation includes superimposing at least two data according to preset weights, specifically The product of the preset position pixel value in different data and the preset weight is added.
根据本发明实施例的又一个方面,提供了一种数据识别方法,基于上述方法,包 括:将待识别的特征数据输入权重注意力神经网络模型,识别得到待识别的特征数据中至少一个目标的关键点二维坐标,其中,权重注意力神经网络模型用于通过自顶向下的方式进行至少一人的姿态估计,检测待识别的特征数据中至少一个目标的位置矩形框,并检测位置矩形框内目标的关键点二维坐标;通过目标的关键点二维坐标进行计算,得到第一预设关键点组合的连线与第二预设关键点组合的连线之间的夹角或第一预设关键点组合的连线与第一预设线之间的夹角;将第一预设关键点组合的连线与第二预设关键点组合的连线之间的夹角或第一预设关键点组合的连线与第一预设线之间的夹角在第一预设数据库中进行匹配,得出目标的识别结果。According to another aspect of the embodiments of the present invention, there is provided a data recognition method. Based on the above method, the method includes: inputting feature data to be recognized into a weighted attention neural network model, and identifying at least one target in the feature data to be recognized Two-dimensional coordinates of key points, where the weighted attention neural network model is used to estimate the pose of at least one person in a top-down manner, detect the position rectangle of at least one target in the feature data to be recognized, and detect the position rectangle The two-dimensional coordinates of the key points of the inner target; through the calculation of the two-dimensional coordinates of the key points of the target, the angle between the line of the first preset key point combination and the line of the second preset key point combination or the first The angle between the line of the preset key point combination and the first preset line; the angle between the line of the first preset key point combination and the line of the second preset key point combination or the first The included angle between the line of the preset key point combination and the first preset line is matched in the first preset database to obtain the recognition result of the target.
可选的,将第一预设关键点组合的连线与第二预设关键点组合的连线之间的夹角或第一预设关键点组合的连线与第一预设线之间的夹角在第一预设数据库中进行匹配,得出目标的识别结果包括:在待识别的特征数据包括图片数据的情况下,将得到的至少一个夹角的角度值与第一预设数据库中的相应的夹角类型的角度值进行匹配,得出图片数据的识别结果。Optionally, the angle between the line of the first preset key point combination and the line of the second preset key point combination or the line of the first preset key point combination and the first preset line Matching the included angle of at least one in the first preset database to obtain the recognition result of the target includes: in the case that the feature data to be recognized includes image data, the obtained angle value of at least one included angle is compared with the first preset database Match the angle values of the corresponding included angle types in to obtain the recognition result of the image data.
可选的,将第一预设关键点组合的连线与第二预设关键点组合的连线之间的夹角或第一预设关键点组合的连线与第一预设线之间的夹角在第一预设数据库中进行匹配,得出目标的识别结果包括:在待识别的特征数据包括视频数据的情况下,针对每一帧或指定帧,获取视频数据各相应帧的中至少一个目标的关键点二维坐标信息,其中,指定帧为固定时间间隔帧和/或关键帧;依据视频数据中各相应帧的至少一个目标的关键点二维坐标信息得到至少一个目标的至少一个特定夹角的角度时间变化曲线,并通过与至少一种标准运动的至少一个夹角的角度时间变化曲线做比较分析,得到识别结果。Optionally, the angle between the line of the first preset key point combination and the line of the second preset key point combination or the line of the first preset key point combination and the first preset line The included angle of is matched in the first preset database, and the recognition result of the target is obtained including: in the case that the feature data to be recognized includes video data, for each frame or specified frame, obtain the center of each corresponding frame of the video data The key point two-dimensional coordinate information of at least one target, wherein the designated frame is a fixed time interval frame and/or key frame; according to the key point two-dimensional coordinate information of at least one target of each corresponding frame in the video data, the at least one target’s two-dimensional coordinate information An angle-time variation curve of a specific included angle is compared and analyzed with at least one angle-time variation curve of at least one standard motion to obtain an identification result.
进一步地,可选的,依据视频数据中各相应帧的至少一个目标的关键点二维坐标信息得到至少一个目标的至少一个特定夹角的角度时间变化曲线,并通过与至少一种标准运动的至少一个夹角的角度时间变化曲线做比较分析,得到识别结果包括:将至少一个目标的至少一个特定夹角的角度时间变化曲线,与预先获得的至少一种标准运动的至少一个夹角的角度时间变化曲线进行相似度比较,若相似度落入第一预设阈值区间,则判定视频数据中各相应帧的相应目标正在进行所对应的标准运动类型;在判定视频数据中各相应帧的相应目标正在进行所对应的标准运动类型的情况下,进一步比较该目标的至少一个特定夹角的角度时间变化曲线与标准运动的相应特定夹角的角度时间变化曲线;若目标的至少一个特定夹角的角度时间变化曲线上相邻最值的差,和标准运动的相应特定夹角的角度时间变化曲线上相邻最值的差落入第二预设阈值区间,则判断视频数据中目标的特定夹角所对应的关节动作规范,否则视频数据中各相应帧的该目标的特定夹角所对应的关节动作不规范;判断目标的至少一个特定夹角的 角度时间变化曲线上相邻峰值之间的距离,和标准运动的相应特定夹角的角度时间变化曲线上相邻峰值的差是否落入第三预设阈值区间、第四预设阈值区间或第五预设阈值区间,进而确认视频数据中各相应帧的目标的特定夹角所对应的关节动作运动强度过低、适当或过高。Further, optionally, according to the two-dimensional coordinate information of the key point of at least one target in each corresponding frame of the video data, the angle-time variation curve of at least one specific included angle of the at least one target is obtained, and the angle-time variation curve of at least one specific included angle is obtained by comparing with at least one standard motion. The comparison and analysis of the angle-time variation curve of at least one included angle to obtain the recognition result includes: comparing the angle-time variation curve of at least one specific included angle of the at least one target with at least one angle of at least one included angle obtained in advance for at least one standard motion The time variation curve is compared for similarity. If the similarity falls within the first preset threshold interval, it is determined that the corresponding target of each corresponding frame in the video data is performing the corresponding standard motion type; the corresponding standard motion type of each corresponding frame in the video data is determined. When the target is performing the corresponding standard exercise type, further compare the angle time change curve of at least one specific included angle of the target with the angle time change curve of the corresponding specific included angle of the standard motion; if the target has at least one specific included angle The difference between the adjacent maximum value on the angle-time variation curve of the standard motion and the adjacent maximum value on the angle-time variation curve of the corresponding specific included angle of the standard motion falls within the second preset threshold interval, and then the specific target in the video data is determined The joint motion specification corresponding to the included angle, otherwise the joint motion corresponding to the specific included angle of the target in each corresponding frame of the video data is not standardized; the angle time variation curve of at least one specific included angle of the target is judged between adjacent peaks Whether the difference between adjacent peaks on the angle-time variation curve of the corresponding specific included angle of the standard motion falls within the third preset threshold interval, the fourth preset threshold interval or the fifth preset threshold interval, and then confirm the video data The motion intensity of the joint action corresponding to the specific included angle of the target in each corresponding frame is too low, appropriate, or too high.
可选的,该方法还包括:依据识别结果在第二预设数据库中进行匹配,得到识别结果对应的体态评估结果。Optionally, the method further includes: performing matching in a second preset database according to the recognition result to obtain a posture evaluation result corresponding to the recognition result.
进一步地,可选的,在得到识别结果对应的体态评估结果之后,该方法还包括:依据体态评估结果在第三预设数据库中进行匹配,得到体态评估结果对应的建议信息。Further, optionally, after obtaining the posture evaluation result corresponding to the recognition result, the method further includes: matching in a third preset database according to the posture evaluation result to obtain suggestion information corresponding to the posture evaluation result.
根据本发明实施例的再一个方面,提供了一种数据识别装置,包括:坐标识别模块,设置为将待识别的特征数据输入权重注意力神经网络模型,识别得到待识别的特征数据中至少一个目标的关键点二维坐标,其中,权重注意力神经网络模型设置为通过自顶向下的方式进行至少一人的姿态估计,检测待识别的特征数据中至少一个目标的位置矩形框,并检测位置矩形框内目标的关键点二维坐标;计算模块,设置为通过目标的关键点二维坐标进行计算,得到第一预设关键点组合的连线与第二预设关键点组合的连线之间的夹角或第一预设关键点组合的连线与第一预设线之间的夹角;匹配模块,设置为将第一预设关键点组合的连线与第二预设关键点组合的连线之间的夹角或第一预设关键点组合的连线与第一预设线之间的夹角在第一预设数据库中进行匹配,得出目标的识别结果。According to another aspect of the embodiments of the present invention, there is provided a data recognition device, including: a coordinate recognition module, configured to input feature data to be recognized into a weighted attention neural network model, and identify at least one of the feature data to be recognized Two-dimensional coordinates of key points of the target, where the weighted attention neural network model is set to estimate the pose of at least one person in a top-down manner, detect the position rectangle of at least one target in the feature data to be recognized, and detect the position The two-dimensional coordinates of the key points of the target in the rectangular frame; the calculation module is set to calculate through the two-dimensional coordinates of the key points of the target to obtain the connection line of the first preset key point combination and the line of the second preset key point combination The included angle between the first preset key point combination or the included angle between the first preset line and the first preset line; the matching module is set to combine the first preset key point combination line with the second preset key point The angle between the combined lines or the angle between the line of the first preset key point combination and the first preset line is matched in the first preset database to obtain the recognition result of the target.
可选的,匹配模块包括:第一匹配单元,设置为在待识别的特征数据包括图片数据的情况下,将得到的至少一个夹角的角度值与第一预设数据库中的相应的夹角类型的角度值进行匹配,得出图片数据的识别结果。Optionally, the matching module includes: a first matching unit configured to compare the obtained angle value of at least one included angle with a corresponding included angle in the first preset database when the feature data to be recognized includes image data The angle value of the type is matched, and the recognition result of the image data is obtained.
可选的,匹配模块包括:获取单元,设置为在待识别的特征数据包括视频数据的情况下,针对每一帧或指定帧,获取视频数据中各相应帧的至少一个目标的关键点二维坐标信息,其中,指定帧为固定时间间隔帧和/或关键帧;第二匹配单元,设置为依据视频数据中各相应帧的至少一个目标的关键点二维坐标信息得到至少一个目标的至少一个特定夹角的角度时间变化曲线,并通过与至少一种标准运动的至少一个夹角的角度时间变化曲线做比较分析,得到识别结果。Optionally, the matching module includes: an acquiring unit configured to acquire, for each frame or specified frame, a key point of at least one target of each corresponding frame in the video data when the feature data to be identified includes video data. Coordinate information, wherein the designated frame is a fixed time interval frame and/or a key frame; the second matching unit is set to obtain at least one of the at least one target according to the key point two-dimensional coordinate information of at least one target of each corresponding frame in the video data The angle-time variation curve of a specific included angle is compared and analyzed with the angle-time variation curve of at least one included angle of at least one standard motion to obtain an identification result.
进一步地,可选的,第二匹配单元包括:第一判断子单元,设置为将至少一个目标的至少一个特定夹角的角度时间变化曲线,与预先获得的至少一种标准运动的至少一个夹角的角度时间变化曲线进行相似度比较,若相似度落入第一预设阈值区间,则判定视频数据中各相应帧的相应目标正在进行所对应的标准运动类型;比较子单元, 设置为在判定视频数据中各相应帧的相应目标正在进行所对应的标准运动类型的情况下,进一步比较该目标的至少一个特定夹角的角度时间变化曲线与标准运动的相应特定夹角的角度时间变化曲线;第二判断子单元,设置为若目标的至少一个特定夹角的角度时间变化曲线上相邻最值的差,和标准运动的相应特定夹角的角度时间变化曲线上相邻最值的差落入第二预设阈值区间,则判断视频数据中各相应帧的目标的特定夹角所对应的关节动作规范,否则视频数据中该目标的特定夹角所对应的关节动作不规范;第三判断子单元,设置为判断目标的至少一个特定夹角的角度时间变化曲线上相邻峰值之间的距离,和标准运动的相应特定夹角的角度时间变化曲线上相邻峰值的差是否落入第三预设阈值区间、第四预设阈值区间或第五预设阈值区间,进而确认视频数据中各相应帧的目标的特定夹角所对应的关节动作运动强度过低、适当或过高。Further, optionally, the second matching unit includes: a first judging subunit, configured to clip the angle-time variation curve of at least one specific included angle of the at least one target with at least one pre-obtained at least one standard motion curve. The angle-time variation curve of the angle is compared for similarity. If the similarity falls within the first preset threshold interval, it is determined that the corresponding target of each corresponding frame in the video data is performing the corresponding standard motion type; the comparison subunit is set to be in In the case of determining that the corresponding target of each corresponding frame in the video data is performing the corresponding standard motion type, further compare the angle time change curve of at least one specific angle of the target with the angle time change curve of the corresponding specific angle of the standard motion ; The second judging subunit is set to determine the difference between the adjacent maximum value on the angle-time variation curve of at least one specific included angle of the target and the difference between the adjacent maximum value on the angle-time variation curve of the corresponding specific included angle of the standard motion If it falls within the second preset threshold interval, determine the joint motion specification corresponding to the specific included angle of the target of each corresponding frame in the video data, otherwise the joint motion corresponding to the specific included angle of the target in the video data is not standardized; third; The judging subunit is set to judge whether the distance between adjacent peaks on the angle-time variation curve of at least one specific included angle of the target and the adjacent peaks on the angle-time variation curve of the corresponding specific included angle of the standard motion falls within The third preset threshold interval, the fourth preset threshold interval or the fifth preset threshold interval further confirms that the joint motion intensity corresponding to the specific included angle of the target of each corresponding frame in the video data is too low, appropriate or too high.
可选的,该装置还包括:评估模块,设置为依据识别结果在第二预设数据库中进行匹配,得到识别结果对应的体态评估结果。Optionally, the device further includes: an evaluation module configured to perform matching in a second preset database according to the recognition result to obtain a posture evaluation result corresponding to the recognition result.
进一步地,可选的,该装置还包括:建议模块,设置为在得到识别结果对应的体态评估结果之后,依据体态评估结果在第三预设数据库中进行匹配,得到体态评估结果对应的建议信息。Further, optionally, the device further includes: a suggestion module configured to, after obtaining the posture evaluation result corresponding to the recognition result, perform matching in a third preset database according to the posture evaluation result to obtain the suggestion information corresponding to the posture evaluation result .
根据本发明实施例的一个方面,提供了一种非易失性存储介质,非易失性存储介质包括存储的程序,其中,在程序运行时控制非易失性存储介质所在设备执行上述方法。According to one aspect of the embodiments of the present invention, there is provided a non-volatile storage medium, the non-volatile storage medium includes a stored program, wherein the device where the non-volatile storage medium is located is controlled to execute the above method when the program is running.
根据本发明实施例的一个方面,提供了一种数据识别装置,包括:非易失性存储介质和设置为运行存储于非易失性存储介质中的程序的处理器,程序运行时执行上述方法。According to one aspect of the embodiments of the present invention, there is provided a data recognition device, including: a non-volatile storage medium and a processor configured to run a program stored in the non-volatile storage medium, and the above method is executed when the program is running .
在本发明实施例中,提出了权重注意力机制,其中,通过引入可学习的mask机制,不人为固定网络的分组卷积模式,让网络自身学习卷积分组,并选择对网络有用的滤波器进行卷积运算,提升网络的性能;基于该权重注意力机制对待训练的权重分类模型进行数据训练,得到权重分类模型,通过该权重分类模型对目标检测模型中的特征提取模块进行初始参数的初始化,从而在得到权重注意力神经网络模型的过程中,通过该权重分类模型提高目标检测模型的准确率及加快模型训练的收敛速度;In the embodiment of the present invention, a weighted attention mechanism is proposed, in which, by introducing a learnable mask mechanism, the grouping convolution mode of the network is not artificially fixed, so that the network itself learns the convolution group and selects the filter useful for the network Perform convolution operation to improve the performance of the network; perform data training on the weight classification model to be trained based on the weight attention mechanism to obtain the weight classification model, and initialize the initial parameters of the feature extraction module in the target detection model through the weight classification model , So that in the process of obtaining the weighted attention neural network model, the weight classification model is used to improve the accuracy of the target detection model and accelerate the convergence speed of the model training;
基于上述数据训练方法,还通过采用自顶向下的多人姿态估计的方式,通过将待识别的特征数据输入权重注意力神经网络模型,识别得到待识别的特征数据中至少一个目标的关键点二维坐标,其中,权重注意力神经网络模型用于通过自顶向下的方式进行至少一人的姿态估计,检测待识别的特征数据中至少一个目标的位置矩形框,并 检测位置矩形框内目标的关键点二维坐标;通过目标的关键点二维坐标进行计算,得到第一预设关键点组合的连线与第二预设关键点组合的连线之间的夹角或第一预设关键点组合的连线与第一预设线之间的夹角;将第一预设关键点组合的连线与第二预设关键点组合的连线之间的夹角或第一预设关键点组合的连线与第一预设线之间的夹角在第一预设数据库中进行匹配,得出目标的识别结果,达到了提升对人体姿态的识别精度和效率的目的,从而实现了根据提升精度和效率后的人体姿态提供评估结果的技术效果,进而解决了由于相关技术在对人体姿态的识别过程中,数据处理效率低的技术问题。Based on the above data training method, the top-down multi-person pose estimation method is adopted, and the key points of at least one target in the feature data to be recognized are recognized by inputting the feature data to be recognized into the weighted attention neural network model. Two-dimensional coordinates, where the weighted attention neural network model is used to estimate the pose of at least one person in a top-down manner, detect the position rectangle of at least one target in the feature data to be recognized, and detect the target in the position rectangle The two-dimensional coordinates of the key points; the two-dimensional coordinates of the key points of the target are calculated to obtain the angle between the line of the first preset key point combination and the line of the second preset key point combination or the first preset The angle between the line of the key point combination and the first preset line; the angle between the line of the first preset key point combination and the line of the second preset key point combination or the first preset The angle between the line of the key point combination and the first preset line is matched in the first preset database to obtain the target recognition result, which achieves the purpose of improving the recognition accuracy and efficiency of the human body posture, thereby achieving The technical effect of providing the evaluation result according to the human body posture after the accuracy and efficiency is improved is solved, and the technical problem of low data processing efficiency due to the related technology in the process of recognizing the human body posture is solved.
附图说明Description of the drawings
此处所说明的附图用来提供对本发明的进一步理解,构成本申请的一部分,本发明的示意性实施例及其说明用于解释本发明,并不构成对本发明的不当限定。在附图中:The drawings described here are used to provide a further understanding of the present invention and constitute a part of this application. The exemplary embodiments of the present invention and their descriptions are used to explain the present invention, and do not constitute an improper limitation of the present invention. In the attached picture:
图1是根据本发明实施例的数据处理方法的流程示意图;Fig. 1 is a schematic flowchart of a data processing method according to an embodiment of the present invention;
图2是根据本发明实施例的数据处理方法中权重注意力机制的示意图;2 is a schematic diagram of a weighted attention mechanism in a data processing method according to an embodiment of the present invention;
图3是根据本发明实施例的数据训练方法的流程示意图;FIG. 3 is a schematic flowchart of a data training method according to an embodiment of the present invention;
图4是根据本发明实施例的数据训练方法中权重分类模型的网络结构图;Fig. 4 is a network structure diagram of a weight classification model in a data training method according to an embodiment of the present invention;
图5是根据本发明实施例的数据训练方法的流程示意图;Fig. 5 is a schematic flowchart of a data training method according to an embodiment of the present invention;
图6是根据本发明实施例的数据训练方法中目标检测模型的示意图;Fig. 6 is a schematic diagram of a target detection model in a data training method according to an embodiment of the present invention;
图7是根据本发明实施例的数据训练方法中单人姿态估计模型的示意图;Fig. 7 is a schematic diagram of a single pose estimation model in a data training method according to an embodiment of the present invention;
图8是根据本发明实施例的数据训练方法中关键点位置和骨架连线的示意图;8 is a schematic diagram of key point positions and skeleton connections in a data training method according to an embodiment of the present invention;
图9a是根据本发明实施例的数据训练方法中关键点位置和骨架连线的标注前的效果示意图;Fig. 9a is a schematic diagram of the effect before labeling the key point positions and the skeleton connection in the data training method according to the embodiment of the present invention;
图9b是根据本发明实施例的数据训练方法中关键点位置和骨架连线的标注后的效果示意图;Fig. 9b is a schematic diagram of the effect of labeling key point positions and skeleton connections in the data training method according to an embodiment of the present invention;
图10是根据本发明实施例的数据训练方法中mix-up的效果示意图;10 is a schematic diagram of the effect of mix-up in the data training method according to an embodiment of the present invention;
图11是根据本发明实施例的数据识别方法的流程示意图;FIG. 11 is a schematic flowchart of a data identification method according to an embodiment of the present invention;
图12是根据本发明实施例的数据识别方法中基于深度学习得到的体态风险的评估的流程示意图;FIG. 12 is a schematic flowchart of a posture risk assessment based on deep learning in a data recognition method according to an embodiment of the present invention;
图13a是根据本发明实施例的体态风险的评估方法中正面照的示意图;Fig. 13a is a schematic diagram of a front view in a method for assessing posture risk according to an embodiment of the present invention;
图13b是根据本发明实施例的体态风险的评估方法中侧面照的示意图;FIG. 13b is a schematic diagram of a side view in a method for assessing a posture risk according to an embodiment of the present invention;
图14是根据本发明实施例的数据识别方法中体态风险的评估结果的展示示意图;FIG. 14 is a schematic diagram showing the evaluation result of posture risk in the data recognition method according to an embodiment of the present invention;
图15是根据本发明实施例的数据识别装置的示意图。Fig. 15 is a schematic diagram of a data recognition device according to an embodiment of the present invention.
具体实施方式detailed description
为了使本技术领域的人员更好地理解本发明方案,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分的实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都应当属于本发明保护的范围。In order to enable those skilled in the art to better understand the solutions of the present invention, the technical solutions in the embodiments of the present invention will be described clearly and completely in conjunction with the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only It is a part of the embodiments of the present invention, but not all the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of the present invention.
需要说明的是,本发明的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本发明的实施例能够以除了在这里图示或描述的那些以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。It should be noted that the terms “first” and “second” in the description and claims of the present invention and the above-mentioned drawings are used to distinguish similar objects, and not necessarily used to describe a specific sequence or sequence. It should be understood that the data used in this way can be interchanged under appropriate circumstances so that the embodiments of the present invention described herein can be implemented in a sequence other than those illustrated or described herein. In addition, the terms "including" and "having" and any variations of them are intended to cover non-exclusive inclusions. For example, a process, method, system, product, or device that includes a series of steps or units is not necessarily limited to those clearly listed. Those steps or units may include other steps or units that are not clearly listed or are inherent to these processes, methods, products, or equipment.
本申请涉及的技术名词:Technical terms involved in this application:
体态评估:通过一定的技术方法对图片中人物的体态状况进行评估,比如是否具有O/X型腿,是否驼背或高低肩等体态上的疾病问题,还可进一步对各种体态状况严重情况进行等级打分;Posture evaluation: Use certain technical methods to evaluate the posture of the characters in the picture, such as whether they have O/X legs, whether they have postural diseases such as hunchback or high and low shoulders, and can further conduct various serious posture conditions Grade scoring
动作识别:通过一定的技术方法识别图片或视频中人物的动作类别,比如行走,举手,鼓掌等姿势名称或动作类别名称;Action recognition: Recognize the action category of the characters in the picture or video through certain technical methods, such as walking, raising hands, applauding and other gesture names or action category names;
关键点检测:通过一定的技术方法识别图片/视频中单个目标或多个目标的关键点坐标,如果目标为人,该关键点坐标为骨骼关键点坐标。Key point detection: Identify the key point coordinates of a single target or multiple targets in the picture/video through a certain technical method. If the target is a person, the key point coordinates are the bone key point coordinates.
实施例一Example one
根据本发明实施例的一个方面,提供了一种数据处理方法,图1是根据本发明实施例的数据处理方法的流程示意图,如图1所示,该方法包括如下步骤:According to one aspect of the embodiment of the present invention, a data processing method is provided. FIG. 1 is a schematic flowchart of a data processing method according to an embodiment of the present invention. As shown in FIG. 1, the method includes the following steps:
步骤S102,将具备第一数量通道的第一特征数据输入至具备第二数量滤波器的第一类卷积层进行计算,输出具备第二数量通道的第二特征数据,其中,第一数量大于第二数量;Step S102: Input the first feature data with the first number of channels into the first type convolutional layer with the second number of filters for calculation, and output the second feature data with the second number of channels, where the first number is greater than Second quantity
步骤S104,将具备第二数量通道的第二特征数据输入至具备第二数量滤波器的第二类卷积层,并根据第二类卷积层中可学习的掩码参数,通过神经网络生成第二类卷积层中各个滤波器的权重的掩码;Step S104, input the second feature data with the second number of channels to the second type convolutional layer with the second number of filters, and generate the mask parameters through the neural network according to the learnable mask parameters in the second type convolutional layer The mask of the weight of each filter in the second type of convolutional layer;
步骤S106,依据掩码确定第二类卷积层中的各个滤波器与第二特征数据中的各通道的连接方式;Step S106: Determine the connection mode between each filter in the second type convolutional layer and each channel in the second feature data according to the mask;
步骤S108,依据连接方式得到的映射关系对第二特征数据进行卷积计算,得到第三特征数据;Step S108: Perform convolution calculation on the second feature data according to the mapping relationship obtained by the connection mode to obtain the third feature data;
步骤S110,将具备第二数量通道的第三特征数据输入至具备第一数量滤波器的第三类卷积层进行计算,输出具备第一数量通道的第四特征数据。Step S110: Input the third feature data with the second number of channels into the third type convolutional layer with the first number of filters for calculation, and output the fourth feature data with the first number of channels.
具体的,结合步骤S102至步骤S110,图2是根据本发明实施例的数据处理方法中权重注意力机制的示意图,以图2所示的权重注意力机制的示例为例进行说明,在本申请实施例中第一数量通道的第一特征数据可以为具有256个通道的特征图数据,第二数据量滤波器的第一类卷积层可以为带有128个滤波器的1×1卷积层,因此,基于图2,步骤S102为将具有256个通道的特征图数据输入至具备128个滤波器的1×1卷积层中进行计算,输出通道数为128的特征图数据;Specifically, in conjunction with step S102 to step S110, FIG. 2 is a schematic diagram of a weighted attention mechanism in a data processing method according to an embodiment of the present invention. Take the example of the weighted attention mechanism shown in FIG. 2 as an example for description. In the embodiment, the first feature data of the first number of channels can be feature map data with 256 channels, and the first type of convolutional layer of the second data volume filter can be a 1×1 convolution with 128 filters. Therefore, based on FIG. 2, step S102 is to input feature map data with 256 channels into a 1×1 convolutional layer with 128 filters for calculation, and output feature map data with 128 channels;
如图2所示,在经过带有128个滤波器的1×1卷积层的计算后,步骤S104中将通道数为128的特征图数据作为输入,输入至带有128个滤波器的3×3卷积层(即,本申请实施例中的具备第二数量滤波器的第二类卷积层),其中,在第二类卷积层(3×3)执行卷积计算的过程中,是根据3×3卷积层中的全连接层生成的各个滤波器的权重的掩码,依据该掩码在步骤S106中确定在3×3卷积层中各个滤波器与通道数为128的特征图数据中的各通道的连接方式(见图2左侧的掩码(mask)图),依据该连接方式在步骤S108中根据该连接方式的映射关系对通道数为128的特征图数据进行卷积计算,得到第三特征数据,即,通道数为128的特征图数据;最终在步骤S110中将该通道数为128的特征图数据作为输入,输入至带有256个滤波器的1×1卷积层中进行计算,得到通道数是256的特征图数据。As shown in Figure 2, after the calculation of the 1×1 convolutional layer with 128 filters, in step S104, the feature map data with the number of channels of 128 is input as input to the 3 with 128 filters. ×3 convolutional layer (that is, the second-type convolutional layer with the second number of filters in the embodiment of the present application), wherein, in the process of performing convolution calculation on the second-type convolutional layer (3×3) , Is a mask based on the weights of each filter generated by the fully connected layer in the 3×3 convolutional layer. According to the mask, it is determined in step S106 that the number of filters and channels in the 3×3 convolutional layer is 128 The connection mode of each channel in the feature map data (see the mask diagram on the left in Figure 2), according to the connection mode, in step S108 according to the mapping relationship of the connection mode, the feature map data with the number of channels is 128 Perform the convolution calculation to obtain the third feature data, that is, the feature map data with the number of channels of 128; finally, in step S110, the feature map data with the number of channels of 128 is used as input and input to 1 with 256 filters. Calculate in the ×1 convolutional layer, and obtain the feature map data with the number of channels of 256.
具体来说,第二类卷积层共有有128通道,3×3的卷积核包含有10个权重,即总共的权重数量为128×10(见图2左侧下方卷积权重(Convolution Weights)图),其对应的输出也为128通道,则其对应的掩码矩阵大小为128×128。因此,我们可以 使用一个10输入通道、128输出通道、sigmoid激活函数的全连接层,针对每个卷积核的10个权重,生成对应128个输出通道的权重掩码。Specifically, the second type of convolutional layer has a total of 128 channels, and the 3×3 convolution kernel contains 10 weights, that is, the total number of weights is 128×10 (see Figure 2 ) Figure), the corresponding output is also 128 channels, and the corresponding mask matrix size is 128×128. Therefore, we can use a fully connected layer with 10 input channels, 128 output channels, and a sigmoid activation function to generate a weight mask corresponding to 128 output channels for the 10 weights of each convolution kernel.
为了能有效率的利用现有深度学习框架的矩阵乘法对权重掩码进行运算,可以选择如下运算方式:将128个输入通道重复128次得到128×128的矩阵,并与掩码矩阵进行Hadamard积,然后形变为长度为128 2的特征通道,再基于128分组、每个分组1个输出通道的分组卷积得到128通道的输出。 In order to efficiently use the matrix multiplication of the existing deep learning framework to calculate the weight mask, you can choose the following calculation method: repeat 128 input channels 128 times to obtain a 128×128 matrix, and perform Hadamard product with the mask matrix , And then transformed into a feature channel with a length of 128 2 , and then based on the grouped convolution of 128 groups and 1 output channel for each group to obtain 128-channel output.
本申请实施例中仅以上述示例为例进行说明,以实现本申请实施例提供的数据处理方法为准,具体不做限定。In the embodiments of the present application, the foregoing examples are only used as an example for description, and implementation of the data processing method provided in the embodiments of the present application shall prevail, and the details are not limited.
在一种可实现的示例中,根据第二类卷积层中可学习的掩码参数,通过神经网络生成第二类卷积层中各个滤波器的权重的掩码包括:根据第二类卷积层中的全连接层生成第二类卷积层中各个滤波器的权重的掩码。In an achievable example, according to the mask parameters that can be learned in the second-type convolutional layer, generating a mask of the weights of each filter in the second-type convolutional layer through a neural network includes: according to the second-type convolutional layer The fully connected layer in the build-up layer generates a mask of the weight of each filter in the second-type convolutional layer.
在本申请实施例中是利用全连接层生成滤波器掩码(mask),如图2左侧mask中所示,在后向传播中让网络去学习掩码(mask),掩码中置1的部分就是网络选择的滤波器。由于是通过学习一个权重的掩码来选择滤波器的连接方式,因此在本申请实施例中步骤S102至步骤S110所使用的滤波器选择和滤波器与通道之间卷积计算的方式为权重注意力机制weight attention,在本申请实施例中引入可学习的mask机制,不人为固定网络的分组卷积模式(分组卷积模式即图2中所示的in与out间相同类别线条,代表该输出卷积仅基于相连的输入通道计算),让网络自身学习卷积分组,并选择对网络有用的滤波器进行卷积运算,提升网络的性能。In the embodiment of this application, a fully connected layer is used to generate a filter mask (mask), as shown in the mask on the left side of Figure 2, in the backward propagation, the network is allowed to learn the mask (mask), and the mask is set to 1. The part is the filter selected by the network. Since the filter connection method is selected by learning a weight mask, the method of filter selection and convolution calculation between the filter and the channel used in steps S102 to S110 in the embodiment of the present application is the weight. The force mechanism weight attention. In the embodiment of this application, a learnable mask mechanism is introduced, and the grouped convolution mode of the fixed network is not artificially fixed (the grouped convolution mode is the same type of line between in and out shown in Figure 2, which represents the output Convolution is only calculated based on the connected input channels), let the network learn the convolution group by itself, and select filters useful for the network to perform convolution operations to improve the performance of the network.
需要说明的是,上述本申请实施例中利用全连接层生成滤波器掩码(mask),还可以通过以下方案来实现:It should be noted that, in the above embodiment of the present application, the fully connected layer is used to generate the filter mask, which can also be implemented by the following scheme:
如图2所示,在经过带有128个滤波器的1×1卷积层的计算后,步骤S104中将通道数为128的特征图数据作为输入,输入至带有128个滤波器的3×3卷积层(即,本申请实施例中的具备第二数量滤波器的第二类卷积层),其中,在第二类卷积层(3×3)执行卷积计算的过程中,是根据3×3卷积层中各个滤波器的权重的可学习掩码,依据该掩码在步骤S106中确定在3×3卷积层中各个滤波器与通道数为128的特征图数据中的各通道的连接方式(见图2左侧的掩码(mask)图),依据该连接方式在步骤S108中根据该连接方式的映射关系对通道数为128的特征图数据进行卷积计算,得到第三特征数据,即,通道数为128的特征图数据;最终在步骤S110中将该通道数为128的特征图数据作为输入,输入至带有256个滤波器的1×1卷积层中进行计算,得到通道数是256的特征图数据。该掩码基于可学习的参数、可求导的变换与sigmoid 激活函数产生128×128的掩码矩阵,并将掩码与滤波器的权重相乘,从而使得第二类卷积层的不同输出滤波器会选择性的使用不同输入特征;在预测时会根据预设阈值二值化为0或1,可选的,可以基于具体连接方式进行分组卷积,从而优化计算效率。As shown in Figure 2, after the calculation of the 1×1 convolutional layer with 128 filters, in step S104, the feature map data with the number of channels of 128 is input as input to the 3 with 128 filters. ×3 convolutional layer (that is, the second-type convolutional layer with the second number of filters in the embodiment of the present application), wherein, in the process of performing convolution calculation on the second-type convolutional layer (3×3) , Is a learnable mask based on the weight of each filter in the 3×3 convolutional layer. According to the mask, it is determined in step S106 that each filter and channel number in the 3×3 convolutional layer is 128 feature map data The connection mode of each channel in (see the mask diagram on the left in Figure 2), according to the connection mode, in step S108 according to the mapping relationship of the connection mode, the convolution calculation is performed on the feature map data with the number of channels 128 , Obtain the third feature data, that is, the feature map data with the number of channels 128; finally, in step S110, the feature map data with the number of channels 128 is used as input, and input to the 1×1 convolution with 256 filters Calculate in the layer, and get the feature map data with the number of channels 256. The mask generates a 128×128 mask matrix based on learnable parameters, derivable transformation and sigmoid activation function, and multiplies the weight of the mask with the filter to make different outputs of the second type of convolutional layer The filter will selectively use different input features; in the prediction, it will be binarized to 0 or 1 according to the preset threshold. Optionally, group convolution can be performed based on the specific connection mode to optimize the calculation efficiency.
其中,全连接层生成掩码使用的参数数量是10(输入)*128+128,实际上,任何基于可训练参数,生成128×128掩码的方式都是可以使用的。本申请实施例仅以上述示例为例进行说明,以实现本申请实施例提供的数据处理方法为准,具体不做限定。Among them, the number of parameters used by the fully connected layer to generate a mask is 10 (input) * 128 + 128. In fact, any method of generating a 128×128 mask based on trainable parameters can be used. The embodiments of the present application only take the foregoing examples as examples for description, and implementation of the data processing methods provided in the embodiments of the present application shall prevail, and the specifics are not limited.
上述方案中是利用反向传播学习滤波器掩码(mask),如图2左侧mask中所示,掩码中每一行置1的部分就是该对应输出滤波器所选择的输入通道特征。由于是通过学习一个权重的掩码来选择滤波器的连接方式,因此在本申请实施例中步骤S102至步骤S110所使用的滤波器选择和滤波器与通道之间卷积计算的方式为权重注意力机制weight attention,在本申请实施例中WeightNet网络引入可学习的mask机制,不人为固定网络的分组卷积模式(分组卷积模式即图2中所示的in与out间相同类别线条,代表该输出卷积仅基于相连的输入通道计算),让网络自身学习卷积分组,并选择对网络有用的滤波器进行卷积运算,提升网络的性能。In the above solution, back propagation is used to learn the filter mask (mask), as shown in the mask on the left side of FIG. 2, the part of each row in the mask that is set to 1 is the input channel feature selected by the corresponding output filter. Since the filter connection method is selected by learning a weight mask, the method of filter selection and convolution calculation between the filter and the channel used in steps S102 to S110 in the embodiment of the present application is the weight. The weight attention of the force mechanism. In the embodiment of this application, the WeightNet network introduces a learnable mask mechanism, which does not artificially fix the grouping convolution mode of the network (the grouping convolution mode is the same type of line between in and out shown in Figure 2, representing The output convolution is only calculated based on the connected input channel), let the network learn the convolution group by itself, and select the useful filter for the network to perform the convolution operation to improve the performance of the network.
可选的,本申请实施例提供的数据处理方法应用于人工智能中的深度学习。Optionally, the data processing method provided in the embodiment of the present application is applied to deep learning in artificial intelligence.
具体的,基于步骤S102至步骤S104的卷积算法可以将该权重注意力机制应用于人工智能技术中,特别是深层神经网络学习中,从而网络能够依据自身学习对滤波器和通道之间自行进行分组,进而进行卷积计算,从而提升深层神经网络学习的数据处理能力。Specifically, the convolution algorithm based on step S102 to step S104 can apply this weighted attention mechanism to artificial intelligence technology, especially deep neural network learning, so that the network can perform self-learning between filters and channels based on its own learning. Group, and then perform convolution calculation, thereby improving the data processing ability of deep neural network learning.
可选的,本申请实施例提供的数据处理方法应用于识别图片/视频中的目标的姿态或动作。Optionally, the data processing method provided in the embodiment of the present application is applied to recognize the posture or action of the target in the picture/video.
具体的,在本申请实施例中目标可以为人、动物等,即,在图片或视频中的人、动物,在人工智能(AI)计算的延伸,以及普遍适用的情况下,基于步骤S102至步骤S104的卷积算法可以将该权重注意力机制具体应用于识别图片/视频中的目标的姿态,这里可以应用于安防监控环境,依据获取到的图片/视频中的人、车、动物、昆虫等目标,预测人、车、动物、昆虫的行为、运动轨迹等;Specifically, in the embodiments of the present application, the target may be humans, animals, etc., that is, humans, animals in pictures or videos, in the extension of artificial intelligence (AI) calculation, and generally applicable, based on step S102 to step S102. The S104 convolution algorithm can specifically apply the weighted attention mechanism to recognize the posture of the target in the picture/video, here it can be applied to the security monitoring environment, based on the people, cars, animals, insects, etc. in the acquired pictures/videos Target, predict the behavior and movement trajectory of people, vehicles, animals, insects, etc.;
在本申请实施例中可以优选的,将该技术应用于医疗诊断中,例如,通过识别图片/视频中的人,将识别到的人作为目标,通过获取该目标的外形得到组成目标的关键点,依据该关键点进行姿态评估,进一步地根据该姿态评估该目标的骨骼健康状况,其中,用于识别图片/视频中人的图像计算过程可以为步骤S102-步骤S110记载的数据处理方法,所适用的卷积算法可以如图2所示。In the embodiments of the present application, it may be preferable to apply this technology to medical diagnosis, for example, by recognizing people in pictures/videos, using the recognized people as targets, and obtaining the key points of the target by obtaining the shape of the target , Perform posture evaluation according to the key point, and further evaluate the bone health of the target according to the posture. The image calculation process used to identify the person in the picture/video can be the data processing method described in step S102-step S110. The applicable convolution algorithm can be shown in Figure 2.
其中,将图2所示的卷积算法应用于AI技术中的数据模型训练详见实施例二中的数据训练方法,在该实施例二中基于图2所示的卷积算法得到一个WeightNet网络,基于该WeightNet网络进行数据训练详见实施例二。Among them, the application of the convolution algorithm shown in FIG. 2 to the data model training in AI technology is detailed in the data training method in the second embodiment. In the second embodiment, a WeightNet network is obtained based on the convolution algorithm shown in FIG. 2 For details of data training based on the WeightNet network, please refer to the second embodiment.
实施例二Example two
根据本发明实施例的一个方面,提供了一种数据训练方法,图3是根据本发明实施例的数据训练方法的流程示意图,如图3所示,该方法包括如下步骤:According to one aspect of the embodiment of the present invention, a data training method is provided. FIG. 3 is a schematic flowchart of a data training method according to an embodiment of the present invention. As shown in FIG. 3, the method includes the following steps:
步骤S302,获取待训练的权重分类模型,其中,权重分类模型为获取图像数据的图像特征的神经网络模型;Step S302: Obtain a weight classification model to be trained, where the weight classification model is a neural network model for acquiring image features of the image data;
具体的,图4是根据本发明实施例的数据训练方法中权重分类模型的网络结构图,在本申请实施例中以WeightNet-50分类模型为例进行说明,基于该分类模型可以对待处理图像进行图像特征提取。同理,WeightNet-101分类模型也同样适用于本申请实施例提供的数据训练方法,本申请实施例仅以WeightNet-50分类模型为例进行说明,以实现本申请实施例提供的数据训练方法为准,具体不做限定。Specifically, FIG. 4 is a network structure diagram of the weight classification model in the data training method according to the embodiment of the present invention. In the embodiment of the present application, the WeightNet-50 classification model is taken as an example for description. Based on the classification model, the image to be processed can be performed Image feature extraction. In the same way, the WeightNet-101 classification model is also applicable to the data training method provided in the embodiment of this application. The embodiment of this application only takes the WeightNet-50 classification model as an example for illustration, and realizes the data training method provided in the embodiment of this application as Standard, no specific limitation.
步骤S304,对待训练的权重分类模型进行训练,得到权重分类模型;其中,对待训练的权重分类模型进行训练中使用的方法包括上述实施例1中的数据处理方法。In step S304, the weight classification model to be trained is trained to obtain the weight classification model; wherein the method used in training the weight classification model to be trained includes the data processing method in Embodiment 1 above.
具体的,基于步骤S302中得到的WeightNet-50分类模型,通过将训练图片引入WeightNet-50分类模型进行卷积训练,最终得到WeightNet分类模型(即,本申请实施例提供的权重分类模型)。Specifically, based on the WeightNet-50 classification model obtained in step S302, by introducing training pictures into the WeightNet-50 classification model for convolution training, the WeightNet classification model (ie, the weight classification model provided in this embodiment of the application) is finally obtained.
可选的,步骤S304中对待训练的权重分类模型进行训练,得到权重分类模型包括:Optionally, training the weight classification model to be trained in step S304 to obtain the weight classification model includes:
步骤S3041,将第一预设数据集中的数据输入待训练的权重分类模型,得到类别预测结果;Step S3041: Input the data in the first preset data set into the weight classification model to be trained to obtain the category prediction result;
具体的,在本申请实施例中预设数据集为涵盖了所有对象类别的图像数据集,对象类别包括:人、狗、马等等自然类别;其中,在步骤S3041中将该图像数据集中的各个类型的图片数据作为第一数据输入至待训练的权重分类模型,即,将人、狗、马等各个类型的图片输入待训练的WeightNet-50分类模型,得到各个图片数据的类别预测结果。Specifically, in this embodiment of the application, the preset data set is an image data set covering all object categories, and the object categories include natural categories such as people, dogs, horses, etc.; wherein, in step S3041, the image data set Each type of picture data is input to the weight classification model to be trained as the first data, that is, pictures of various types, such as people, dogs, horses, etc., are input to the WeightNet-50 classification model to be trained, and the category prediction results of each picture data are obtained.
其中,如图4所示,以待训练的WeightNet-50分类模型为例,WeightNet-50分类模型由残差结构,池化结构和全连接结构组成:Among them, as shown in Figure 4, taking the WeightNet-50 classification model to be trained as an example, the WeightNet-50 classification model consists of a residual structure, a pooling structure and a fully connected structure:
残差结构由三层卷积完成,第一层卷积层有n个1x1的卷积核,步长为1;第二层有 n个3x3的卷积核,步长为1;第三层有2n个1x1的卷积核,步长为1;The residual structure is completed by three layers of convolution. The first layer of convolution has n 1x1 convolution kernels with a step size of 1. The second layer has n 3x3 convolution kernels with a step size of 1. The third layer There are 2n 1x1 convolution kernels with a step size of 1;
具体的网络参数配置为:第一个卷积块是n=64个步长为2的7x7的卷积核,带有3x3且步长为2的池化;第二个卷积块是由3个n=64的残差结构组成;第三个卷积块是由4个n=128的残差结构组成;第四个卷积块是由6个n=256的残差结构组成;第五个卷积块是由3个n=512的残差结构组成;最后是一个全局平均池化层和1000输出的softmax全连接层。The specific network parameter configuration is: the first convolution block is n=64 7x7 convolution kernels with a step size of 2 with 3x3 pooling with a step size of 2; the second convolution block is composed of 3 N=64 residual structure; the third convolution block is composed of 4 n=128 residual structures; the fourth convolution block is composed of 6 n=256 residual structures; fifth A convolutional block is composed of 3 residual structures with n=512; the last is a global average pooling layer and a softmax fully connected layer with 1000 outputs.
需要说明的是,在本申请实施例中,将第一预设数据集中的数据输入待训练的权重分类模型,得到的类别预测结果可以为图片的类别预测结果,还可以为视频中图像的类别预测结果,本申请实施例中所使用的预设数据集仅以图片类的数据集为优选示例进行说明,此外,还可以包括视频图像类的数据集。以实现本申请实施例提供的数据训练方法为准,具体不做限定。It should be noted that, in the embodiment of the present application, the data in the first preset data set is input into the weight classification model to be trained, and the obtained category prediction result can be the category prediction result of the picture or the category of the image in the video. As for the prediction result, the preset data set used in the embodiment of the present application only uses a picture-like data set as a preferred example for description, and in addition, it may also include a video image-like data set. The implementation of the data training method provided in the embodiment of the present application shall prevail, which is not specifically limited.
步骤S3042,依据类别预测结果与第一预设数据集中的数据的标签类别,得到类别预测结果与第一预设数据集中的数据的标签类别的误差;Step S3042: Obtain the error between the category prediction result and the label category of the data in the first preset data set according to the category prediction result and the label category of the data in the first preset data set;
具体的,将该依据第一预设数据集中已标注标签类别的图片数据输入至待训练的WeightNet-50分类模型,得到通过前向传播提取特征,得到类别预测结果。Specifically, the image data of the labeled category based on the first preset data set is input into the WeightNet-50 classification model to be trained, and the feature is extracted through forward propagation, and the category prediction result is obtained.
将类别预测结果与第一预设数据集中的数据的标签类别进行比对,得到类别预测结果与第一预设数据集中的数据的标签类别的误差。The category prediction result is compared with the label category of the data in the first preset data set to obtain an error between the category prediction result and the label category of the data in the first preset data set.
步骤S3043,依据误差进行反向传播算法训练待训练的权重分类模型,直至待训练的权重分类模型收敛,得到收敛的权重分类模型。Step S3043: Perform a back propagation algorithm to train the weight classification model to be trained according to the error, until the weight classification model to be trained converges, and a converged weight classification model is obtained.
具体的,基于步骤S3042中得到的误差,采用误差反向传播算法来训练该模型,直至模型收敛,得到WeightNet-50分类模型。Specifically, based on the error obtained in step S3042, the error back propagation algorithm is used to train the model until the model converges, and the WeightNet-50 classification model is obtained.
这里需要说明的是,本申请实施例中第一预设数据集可以为ImageNet数据集,通过使用百万级ImageNet分类数据预训练WeightNet分类模型,并通过收敛的权重分类模型初始化目标检测模型中的特征提取模块,以此提高最终目标检测模型的准确率及加快模型训练的收敛速度。其中,使用ImageNet数据集是由于,ImageNet包含1000个类别的120万张ImageNet图像,庞大的数据量作为样本进行训练能够满足AI技术对深层神经网络学习的需求。It should be noted here that the first preset data set in this embodiment of the application may be the ImageNet data set. The WeightNet classification model is pre-trained by using millions of ImageNet classification data, and the target detection model is initialized by the convergent weight classification model. The feature extraction module improves the accuracy of the final target detection model and speeds up the convergence speed of model training. Among them, the ImageNet data set is used because ImageNet contains 1.2 million ImageNet images in 1,000 categories, and the huge amount of data used as samples for training can meet the needs of AI technology for deep neural network learning.
因此需要说明的是,本申请实施例提供的第一预设数据集仅以ImageNet数据集为例进行说明,以能够实现本申请实施例提供的数据训练方法为准,具体不做限定。Therefore, it should be noted that the first preset data set provided in the embodiment of the present application only uses the ImageNet data set as an example for description, and the data training method provided in the embodiment of the present application shall prevail, and the specifics are not limited.
进一步地,可选的,步骤S3044中依据误差进行反向传播算法训练待训练的权重 分类模型,直至待训练的权重分类模型收敛包括:Further, optionally, performing a back-propagation algorithm to train the weight classification model to be trained in step S3044 according to the error until the weight classification model to be trained converges includes:
步骤S30441,通过激励传播和权重更新的反复迭代,直至待训练的权重分类模型收敛;其中,在待训练的权重分类模型包括残差结构,池化结构和全连接结构的情况下,通过激励传播和权重更新的反复迭代,直至待训练的权重分类模型收敛包括:在激励传播阶段,将图像通过待训练的权重分类模型的卷积层获取特征,在待训练的权重分类模型的全连接层获取类别预测结果,再将类别预测结果与第一预测数据集中的数据的标签类别求差,得到隐藏层和输出层的响应误差;在权重更新阶段,将误差与本层响应对前一层响应的函数的导数相乘,获得两层之间权重矩阵的梯度,沿梯度的反方向以设定的学习率调整权重矩阵;将梯度矩阵确定为前一层的误差,并计算前一层的权重矩阵,通过迭代计算对待训练的权重分类模型更新,直至待训练的权重分类模型收敛。Step S30441, through repeated iterations of excitation propagation and weight update, until the weight classification model to be trained converges; wherein, in the case that the weight classification model to be trained includes residual structure, pooling structure and fully connected structure, through excitation propagation Repeated iterations of weight update and weight update until the weight classification model to be trained converges include: in the incentive propagation stage, the image is obtained through the convolution layer of the weight classification model to be trained to obtain features, and the weight classification model to be trained is obtained in the fully connected layer For the category prediction result, the difference between the category prediction result and the label category of the data in the first prediction data set is obtained to obtain the response error of the hidden layer and the output layer; in the weight update stage, the error and the response of the current layer are compared with the response of the previous layer. Multiply the derivative of the function to obtain the gradient of the weight matrix between the two layers, adjust the weight matrix with the set learning rate along the opposite direction of the gradient; determine the gradient matrix as the error of the previous layer, and calculate the weight matrix of the previous layer , Through iterative calculation, the weight classification model to be trained is updated until the weight classification model to be trained converges.
具体的,仍以ImageNet数据集为例进行说明,利用ImageNet已经标注类别数据对网络参数进行训练,通过前向传播提取特征,利用网络输出的类别预测结果(one-hot)与真实的标签类别误差,采用误差反向传播算法来训练该模型,直至模型收敛,得到WeightNet-50分类模型。Specifically, still take the ImageNet data set as an example to illustrate, use ImageNet's labeled category data to train network parameters, extract features through forward propagation, and use the category prediction results (one-hot) output by the network and the true label category error , The error back propagation algorithm is used to train the model until the model converges, and the WeightNet-50 classification model is obtained.
其中,通过误差反向传播算法来训练卷积神经网络模型,具体为激励传播以及权重更新两个环节的反复迭代,直到达到收敛条件为止;Among them, the error back propagation algorithm is used to train the convolutional neural network model, specifically the repeated iteration of the two links of excitation propagation and weight update, until the convergence condition is reached;
在激励传播阶段将图像通过WeightNet-50分类模型的卷积层获取特征,在网络最后的全连接层获取预测结果,再将预测结果与真实结果求差,从而获得隐藏层和输出层的响应误差;In the excitation propagation stage, the image is obtained through the convolutional layer of the WeightNet-50 classification model to obtain features, and the prediction result is obtained in the last fully connected layer of the network, and then the prediction result and the real result are calculated to obtain the response error of the hidden layer and the output layer. ;
在权重更新阶段,先将已知误差与本层响应对前一层响应的函数的导数相乘,从而获得两层之间权重矩阵的梯度,然后沿这个梯度的反方向以设定的学习率调整权重矩阵;随后,将该梯度矩阵当作前一层的误差从而计算前一层的权重矩阵,以此类推完成对整个模型的更新;In the weight update stage, the known error is first multiplied by the derivative of the function of the response of the current layer to the response of the previous layer to obtain the gradient of the weight matrix between the two layers, and then follow the opposite direction of this gradient with the set learning rate Adjust the weight matrix; then, use the gradient matrix as the error of the previous layer to calculate the weight matrix of the previous layer, and so on to complete the update of the entire model;
这里本申请实施例中训练WeightNet-50分类模型可以采用Adam作为优化器,在设置参数上可以将基础学习率设为0.1,在第32000和48000迭代时将其除以10,并在64000迭代时终止训练,权重衰减值设为0.0001,批尺寸设为128。Here, in the embodiment of this application, Adam can be used as the optimizer for training the WeightNet-50 classification model. The basic learning rate can be set to 0.1 in the setting parameters, and it will be divided by 10 at the 32000 and 48000 iterations, and at 64000 iterations The training is terminated, the weight decay value is set to 0.0001, and the batch size is set to 128.
需要说明的是,本申请实施例中训练WeightNet-50分类模型以采用Adam作为优化器为例,在设置参数上也仅以上述为优选示例,以能够实现本申请实施例提供的数据训练方法为准,具体不做限定。It should be noted that the training of the WeightNet-50 classification model in the embodiments of this application takes Adam as the optimizer as an example, and only the above is a preferred example in setting parameters, and the data training method provided in the embodiments of this application can be realized as Standard, no specific limitation.
实施例三Example three
根据本发明实施例的一个方面,提供了一种数据训练方法,图5是根据本发明实施例的数据训练方法的流程示意图,如图5所示,包括:According to one aspect of the embodiment of the present invention, a data training method is provided. FIG. 5 is a schematic flowchart of the data training method according to an embodiment of the present invention, as shown in FIG. 5, including:
步骤S502,通过收敛的权重分类模型初始化目标检测模型中的特征提取模块,获得待训练的目标检测模型;其中,该收敛的权重分类模型通过实施例2中的方法训练训练得到;In step S502, the feature extraction module in the target detection model is initialized by the convergent weight classification model to obtain the target detection model to be trained; wherein, the convergent weight classification model is obtained through training and training in the method in Embodiment 2;
具体的,本申请实施例提供的数据训练方法是适用于训练一种权重注意力神经网络模型,其中,该权重注意力神经网络模型包含目标检测模型(Faster-RCNN),该Faster-RCNN用于提取输入图像中各个人物的位置框信息给单人姿态估计模型进行姿态估计,其中,Faster-RCNN包括:特征提取模块(WeightNet)、建议框生成模块(RPN)和目标分类器与位置框回归预测模块(Fast-RCNN);Specifically, the data training method provided in the embodiments of the present application is suitable for training a weighted attention neural network model, where the weighted attention neural network model includes a target detection model (Faster-RCNN), and the Faster-RCNN is used for Extract the position frame information of each person in the input image to estimate the pose of a single person pose estimation model. Among them, Faster-RCNN includes: feature extraction module (WeightNet), suggestion frame generation module (RPN), target classifier and position frame regression prediction Module (Fast-RCNN);
其中,步骤S502中的特征提取模块为Faster-RCNN中的特征提取模块,基于实施例2得到的权重分类模型,基于该权重分类模型再对特征提取模块进行初始化,但不包括输出层参数。这里通过权重分类模型对获取第一预设数据集中图像特征的特征提取模块的权重进行初始化可以为:Among them, the feature extraction module in step S502 is the feature extraction module in Faster-RCNN, based on the weight classification model obtained in embodiment 2, and then the feature extraction module is initialized based on the weight classification model, but does not include the output layer parameters. Here, the weights of the feature extraction module that obtains the image features in the first preset data set can be initialized by the weight classification model as follows:
仍以WeightNet-50分类模型为例,将WeightNet-50分类模型在ImageNet数据集上进行分类任务的预训练,并将最终收敛的权重作为人物检测模型中特征提取模块的初始化权重,以此提高最终人物检测模型的准确率及加快模型训练的收敛速度;Still taking the WeightNet-50 classification model as an example, the WeightNet-50 classification model is pre-trained for the classification task on the ImageNet dataset, and the final convergent weight is used as the initial weight of the feature extraction module in the person detection model to improve the final Accuracy of the character detection model and speed up the convergence speed of model training;
这里训练和验证时遵循上述数据预处理操作;采用Adam作为优化器(Adam:Adaptive moment estimation,适应性矩估计,一种随机优化的方法);基础学习率设为0.1,在迭代第32000步和48000步时除以10,并在第64000步迭代时终止训练;权重衰减值为0.0001;批尺寸设置为128。The training and verification here follow the above-mentioned data preprocessing operations; Adam is used as the optimizer (Adam: Adaptive Moment estimation, a method of random optimization); the basic learning rate is set to 0.1, and in the 32000th step of the iteration and Divide by 10 at 48000 steps, and terminate training at the 64000th iteration; the weight decay value is 0.0001; the batch size is set to 128.
其中,第一预设数据集中的图像的预处理操作采用预设概率随机水平翻转,其中,预设概率可以设定为50%,在得到不需要翻转的图像时则不需要进行随机翻转,以实际符合图像处理的需求为准。Among them, the preprocessing operation of the image in the first preset data set adopts a preset probability to randomly flip horizontally. The preset probability can be set to 50%. When an image that does not need to be flipped is obtained, there is no need to perform random flipping. The actual requirements for image processing shall prevail.
步骤S504,通过第二预设数据集中的目标位置框标签信息对待训练的目标检测模型进行训练,得到训练后的目标检测模型;Step S504: Train the target detection model to be trained by using the target location frame label information in the second preset data set to obtain the trained target detection model;
具体的,基于步骤S502中得到的待训练的目标检测模型,图6是根据本发明实施例的数据训练方法中目标检测模型的示意图,如图6所示,结合第二预设数据集,在本申请实施例中,第二预设数据集可以为包含有目标位置框标签信息的数据集,其中,第二预设数据集可以为COCO和Kinetics-14数据集中的目标位置框标签信息组成的数据集,这里通过使用COCO和Kinetics-14数据集中的目标位置框标签信息组成的数据 集训练目标检测模型以提高最终整体架构对类似场景中人物位置的定位的识别效果。此外,需要说明的是,本申请实施例中的特征提取模块是基于实施例2中权重分类模型训练得到,区别在于,权重分类模型与特征提取模块的结构以及功能;Specifically, based on the target detection model to be trained obtained in step S502, FIG. 6 is a schematic diagram of the target detection model in the data training method according to an embodiment of the present invention. As shown in FIG. 6, in combination with the second preset data set, In the embodiment of the present application, the second preset data set may be a data set containing the target location frame label information, where the second preset data set may be composed of the target location frame label information in the COCO and Kinetics-14 data sets Data set. Here, the target detection model is trained by using the data set composed of the target location frame label information in the COCO and Kinetics-14 data sets to improve the recognition effect of the final overall architecture on the location of characters in similar scenes. In addition, it should be noted that the feature extraction module in the embodiment of the present application is obtained based on the weight classification model training in Example 2. The difference lies in the structure and function of the weight classification model and the feature extraction module;
其中,权重分类模型,用于将WeightNet-50分类模型在ImageNet数据集上进行分类任务的预训练,并将最终收敛的权重作为目标检测模型中特征提取模块的初始化权重,以此提高最终目标检测模型的准确率及加快模型训练的收敛速度;且权重分类模型的结构为:权重分类网络+分类器;Among them, the weight classification model is used to pre-train the WeightNet-50 classification model on the ImageNet dataset for classification tasks, and use the final convergent weight as the initial weight of the feature extraction module in the target detection model to improve the final target detection The accuracy of the model and speed up the convergence speed of model training; and the structure of the weight classification model is: weight classification network + classifier;
而特征提取模块,是通过权重分类模型对权重进行初始化得到;在结构上特征提取模块的结构为去掉分类器的权重分类模型部分,即,包含有权重分类的部分。The feature extraction module is obtained by initializing the weights through the weight classification model; structurally, the structure of the feature extraction module is to remove the weight classification model part of the classifier, that is, it contains the weight classification part.
步骤S506,依据第三预设数据集中的目标关键点标签信息对待训练的单人姿态估计模型的网络参数进行训练,得到训练后的单人姿态估计模型;Step S506, training the network parameters of the single-person pose estimation model to be trained according to the target key point label information in the third preset data set to obtain the trained single-person pose estimation model;
具体的,在本申请实施例中,第三预设数据集可以为包含有目标关键点标签信息的数据集,其中,第三预设数据集可以为COCO和Kinetics-14数据集中的目标关键点标签信息组成的数据集,这里通过使用COCO和Kinetics-14数据集中的目标关键点标签信息组成的数据集训练单人姿态估计模型以提高最终整体架构对类似场景中人物骨骼关键点的识别效果。Specifically, in the embodiment of the present application, the third preset data set may be a data set containing tag information of target key points, where the third preset data set may be the target key points in the COCO and Kinetics-14 datasets A data set composed of tag information. Here, a single-person pose estimation model is trained by using a data set composed of target key point tag information in the COCO and Kinetics-14 data sets to improve the recognition effect of the final overall architecture on the key points of character bones in similar scenes.
其中,在本申请实施例中单人姿态估计模型可以以HRNet模型为例,图7是根据本发明实施例的数据训练方法中单人姿态估计模型的示意图,如图7所示,基于HRNet算法及上述构建的数据集,重新训练一个符合此场景的单人姿态模型;HRNet模型并行连接高分辨率到低分辨率的子网,区别于相关技术中的串行连接,HRNet模型保持高分辨率,而不是通过一个低到高的过程恢复分辨率;以及区别于相关技术中的融合方案都将低层和高层的表示集合起来,而本申请实施例中HRNet模型使用重复的多尺度融合,利用相同深度和相似级别的低分辨率表示来提高高分辨率表示。Among them, the single-person pose estimation model in the embodiment of the present application can take the HRNet model as an example. FIG. 7 is a schematic diagram of the single-person pose estimation model in the data training method according to an embodiment of the present invention. As shown in FIG. 7, it is based on the HRNet algorithm. And the data set constructed above, retrain a single-person pose model that meets this scenario; the HRNet model connects high-resolution to low-resolution subnets in parallel, which is different from the serial connection in related technologies, and the HRNet model maintains high resolution , Instead of restoring the resolution through a low-to-high process; and the fusion scheme that is different from the related technology, it combines the low-level and high-level representations. In the embodiment of this application, the HRNet model uses repeated multi-scale fusion, using the same Depth and similar level of low-resolution representation to improve high-resolution representation.
步骤S508,依据训练后的目标检测模型和训练后的单人姿态估计模型,得到权重注意力神经网络模型。In step S508, a weighted attention neural network model is obtained according to the trained target detection model and the trained single-person pose estimation model.
具体的,结合步骤S504得到的训练后的目标检测模型和步骤S506得到的训练后的单人姿态估计模型,得到权重注意力神经网络模型,即,Faster-RCNN模型与HRNet模型的结合构成权重注意力神经网络模型。Specifically, combining the trained target detection model obtained in step S504 and the trained single pose estimation model obtained in step S506 to obtain a weighted attention neural network model, that is, the combination of the Faster-RCNN model and the HRNet model constitutes a weighted attention Force neural network model.
综上,本申请实施例提供的数据训练方法中第一预设数据集,用于训练权重分类模型,然后利用收敛的权重分类模型初始化目标检测模型中的特征提取模块;第二预设数据集用于训练目标检测模型;第三预设数据集用于训练单人姿态估计模型。In summary, the first preset data set in the data training method provided by the embodiments of the present application is used to train the weight classification model, and then the convergent weight classification model is used to initialize the feature extraction module in the target detection model; the second preset data set Used to train the target detection model; the third preset data set is used to train the single-person pose estimation model.
可选的,步骤S504中通过第二预设数据集中的目标位置框标签信息对待训练的目标检测模型进行训练,得到训练后的目标检测模型包括:Optionally, in step S504, training the target detection model to be trained based on the target location frame label information in the second preset data set, and obtaining the trained target detection model includes:
步骤S5041,在目标检测模型包括特征提取模块、建议框生成模块和目标分类器与位置框回归预测模块的情况下,分别对特征提取模块和建议框生成模块进行训练,得到特征提取模块第一参数值和建议框生成模块第一参数值;Step S5041, in the case where the target detection model includes a feature extraction module, a suggestion box generation module, and a target classifier and a position box regression prediction module, train the feature extraction module and the suggestion box generation module respectively to obtain the first parameter of the feature extraction module Value and suggestion box generation module first parameter value;
具体的,基于步骤S502中对Faster-RCNN包括:特征提取模块、建议框生成模块(RPN)和目标分类器与位置框回归预测模块(Fast-RCNN),在对特征提取模块和RPN模块参数进行训练时具体如下:单独训练特征提取模块和RPN模块参数,得到rpn1(即,本申请实施例中的建议框生成模块第一参数值)和weightnet1(即,本申请实施例中的特征提取模块第一参数值)。Specifically, based on the Faster-RCNN in step S502 including: feature extraction module, suggestion box generation module (RPN) and target classifier and location box regression prediction module (Fast-RCNN), the parameters of the feature extraction module and RPN module The details of the training are as follows: separately train the feature extraction module and the RPN module parameters to obtain rpn1 (that is, the first parameter value of the suggestion box generation module in the embodiment of this application) and weightnet1 (that is, the first parameter value of the feature extraction module in the embodiment of this application) A parameter value).
其中,目标检测模型中的建议框生成模块,和目标分类器与位置框回归预测模块分别可采用不同的数据分布方法进行初始化(常用的初始化方式有:1.初始化为0,2.随机初始化,3.Xavier initialization,4.He initialization;在本申请实施例中优选3或4)。Among them, the suggestion box generation module in the target detection model, and the target classifier and position box regression prediction module can be initialized by different data distribution methods (commonly used initialization methods are: 1. Initialize to 0, 2. Random initialization, 3. Xavier initialization, 4. He initialization; in the embodiment of the present application, 3 or 4 is preferred).
步骤S5042,依据特征提取模块第一参数值和建议框生成模块第一参数值训练目标分类器与位置框回归预测模块,得到目标分类器与位置框回归预测模块第一参数值和特征提取模块第二参数值;Step S5042: Train the target classifier and the position box regression prediction module according to the first parameter value of the feature extraction module and the first parameter value of the suggestion box generation module to obtain the first parameter value of the target classifier and the position box regression prediction module and the first parameter value of the feature extraction module. Two parameter values;
具体的,依据特征提取模块第一参数值和建议框生成模块第一参数值训练Fast-RCNN(即,本申请实施例中的目标分类器与位置框回归预测模块),得到fast-rcnn1(即,本申请实施例中的目标分类器与位置框回归预测模块第一参数值),WeightNet2(即,本申请实施例中的特征提取模块第二参数值)。Specifically, Fast-RCNN (that is, the target classifier and position box regression prediction module in this embodiment of the application) is trained according to the first parameter value of the feature extraction module and the first parameter value of the suggestion box generation module to obtain fast-rcnn1 (ie , The first parameter value of the target classifier and the position box regression prediction module in the embodiment of the present application), WeightNet2 (ie, the second parameter value of the feature extraction module in the embodiment of the present application).
步骤S5043,依据目标分类器与位置框回归预测模块第一参数值和特征提取模块第二参数值训练建议框生成模块,得到建议框生成模块第二参数值;Step S5043, training the suggestion box generation module according to the first parameter value of the target classifier and the position box regression prediction module and the second parameter value of the feature extraction module to obtain the second parameter value of the suggestion box generation module;
具体的,结合fast-rpn1和WeightNet2训练RPN(即,本申请实施例中的建议框生成模块),得到rpn2(即,本申请实施例中的建议框生成模块第二参数值)。Specifically, the RPN (ie, the suggestion box generation module in the embodiment of the present application) is trained in combination with fast-rpn1 and WeightNet2 to obtain rpn2 (ie, the second parameter value of the suggestion box generation module in the embodiment of the present application).
步骤S5044,依据建议框生成模块第二参数值和特征提取模块第二参数值训练目标分类器与位置框回归预测模块,得到目标分类器与位置框回归预测模块第二参数值。Step S5044, training the target classifier and the position box regression prediction module according to the second parameter value of the suggestion box generation module and the second parameter value of the feature extraction module to obtain the second parameter value of the target classifier and the position box regression prediction module.
具体的,依据特征提取模块第二参数值和建议框生成模块第二参数值训练Fast-RCNN模块,得到fast-rcnn2(即,本申请实施例中的目标分类器与位置框回归预测模块第二参数值)。Specifically, the Fast-RCNN module is trained according to the second parameter value of the feature extraction module and the second parameter value of the suggestion box generation module to obtain fast-rcnn2 (that is, the second parameter value of the target classifier and the position box regression prediction module in this embodiment of the application). Parameter value).
其中,在训练目标检测模型的过程中,输入图像预处理操作可以采用mix-up以及随机水平翻转(50%),并且训练目标检测模型的过程中可以以用Adam作为优化器为例,其中的参数可以设置为:基础学习率为0.001,权重衰减值为0.0001,批尺寸设置为32,4个训练阶段的每个迭代步数分别为80000,40000,80000,40000。Among them, in the process of training the target detection model, the input image preprocessing operation can use mix-up and random horizontal flip (50%), and the process of training the target detection model can take Adam as an optimizer as an example. The parameters can be set as follows: the basic learning rate is 0.001, the weight attenuation value is 0.0001, the batch size is set to 32, and the steps of each iteration of the 4 training stages are 80,000, 40000, 80000, and 40000 respectively.
进一步地,可选的,特征提取模块用于提取第二预设数据集中的各个数据的特征;建议框生成模块用于依据第二预设数据集中的各个数据的特征生成各个数据的候选目标框;目标分类器与位置框回归预测模块用于依据第二预设数据集中的各个数据的特征和各个数据的候选目标框获取第二预设数据集中各个数据的目标的检测框及相应检测框的类别;在建议框生成模块包括一个滑窗的卷积层,卷积层后连接两个并行的卷积层,两个并行的卷积层分别为回归层和分类层的情况下,建议框生成模块用于依据第二预设数据集中的各个数据的特征生成各个数据的候选目标框包括:依据第二预设数据集中的各个数据的特征通过回归层,得到第二预设数据集中的各个数据的各个候选目标框的中心锚点的坐标和相应的候选目标框的宽与高;通过分类层判定各个数据的各个候选目标框是前景或背景。Further, optionally, the feature extraction module is used to extract the features of each data in the second preset data set; the suggestion frame generation module is used to generate candidate target frames of each data according to the features of each data in the second preset data set ; The target classifier and position frame regression prediction module is used to obtain the detection frame of each data target in the second preset data set and the corresponding detection frame according to the characteristics of each data in the second preset data set and the candidate target frame of each data Category; when the suggestion frame generation module includes a convolutional layer with a sliding window, two parallel convolutional layers are connected after the convolutional layer, and the two parallel convolutional layers are the regression layer and the classification layer, the suggestion frame is generated The module is used to generate candidate target frames of each data according to the characteristics of each data in the second preset data set, including: obtaining each data in the second preset data set through the regression layer according to the characteristics of each data in the second preset data set The coordinates of the center anchor point of each candidate target frame and the width and height of the corresponding candidate target frame; the classification layer determines whether each candidate target frame of each data is foreground or background.
具体的,基于上述,如图6所示,本申请实施例提供的目标检测模型中特征提取模块用于提取输入图像的特征图(feature map);Specifically, based on the foregoing, as shown in FIG. 6, the feature extraction module in the target detection model provided by the embodiment of the present application is used to extract a feature map of the input image;
建议框生成模块(RPN)输入的是特征提取模块提取的特征图(feature map),输出为一系列候选人体目标矩形框坐标,用于对输入图像的侯选目标框的生成。The proposal frame generation module (RPN) inputs the feature map extracted by the feature extraction module, and outputs a series of candidate target rectangular frame coordinates, which are used to generate the candidate target frame of the input image.
目标分类器与位置框回归预测模块(Fast-RCNN)主要输入为特征提取模块提取的特征图和建议框生成模块生成的侯选框,用于精确的位置的回归及类别预测结果。The main input of the target classifier and the location box regression prediction module (Fast-RCNN) is the feature map extracted by the feature extraction module and the candidate box generated by the suggestion box generation module for accurate location regression and category prediction results.
其中,RPN网络结构包括:一个使用3×3滑窗的卷积层,其后连接两个并行的1×1的卷积层,分别为回归层(reg_layer)和分类层(cls-layer)。其中,回归层(reg_layer)用于预测窗口的中心锚点对应在原图上侯选框的坐标x,y和宽高w,h;cls-layer(分类层):用于判定该侯选是前景还是背景。Among them, the RPN network structure includes: a convolution layer using a 3×3 sliding window, followed by two parallel 1×1 convolution layers, which are a regression layer (reg_layer) and a classification layer (cls-layer). Among them, the regression layer (reg_layer) is used to predict the center anchor point of the window corresponding to the coordinates x, y and width and height w, h of the candidate box on the original image; cls-layer (classification layer): used to determine that the candidate is the foreground Still background.
进一步地,可选的,在目标分类器与位置框回归预测模块的结构为顺次连接的一个池化层、三个全连接层和并行的两个全连接层的情况下,目标分类器与位置框回归预测模块用于依据第二预设数据集中的各个数据的特征和各个数据的候选目标框获取第二预设数据集中各个数据的各个目标的检测框和相应的检测框的类别包括:通过池化层将特征提取模块输出的不同长度的各个数据的特征转换为固定长度的各个数据的特征;依据固定长度的各个数据的特征,分别通过三个全连接层后再通过并行的两个全连接层,输出第二预设数据集中各个数据的各个目标的检测框及相应检测框的类别。Further, optionally, in the case where the structure of the target classifier and the position box regression prediction module is a pooling layer, three fully connected layers and two parallel fully connected layers connected in sequence, the target classifier and The position frame regression prediction module is used to obtain the detection frame of each target of each data in the second preset data set and the corresponding detection frame category according to the characteristics of each data in the second preset data set and the candidate target frame of each data, including: Through the pooling layer, the characteristics of each data of different lengths output by the feature extraction module are converted into the characteristics of each data of a fixed length; according to the characteristics of each data of a fixed length, pass through three fully connected layers and then pass through two parallel The fully connected layer outputs the detection frame of each target of each data in the second preset data set and the category of the corresponding detection frame.
具体的,以对人物检测为例,目标分类器与位置框回归预测模块包括一个ROI池化层,三个全连接层和分别并行两个全连接层,ROI池化层的主要作用为将不同大小的输入转换为固定长度的输出,两个并行的全连接层主要用于预测类别和回归人物检测框。Specifically, taking person detection as an example, the target classifier and position box regression prediction module includes an ROI pooling layer, three fully connected layers and two fully connected layers in parallel. The main function of the ROI pooling layer is to differentiate different The input of the size is converted to the output of a fixed length, and the two parallel fully connected layers are mainly used to predict the category and return to the person detection frame.
可选的,步骤S506中依据第三预设数据集中的目标关键点标签信息对待训练的单人姿态估计模型的网络参数进行训练,得到训练后的单人姿态估计模型包括:依据第三预设数据集中的目标关键点标签信息对待训练的单人姿态估计模型的网络参数进行训练,通过前向传播和后向传播算法迭代的更新待训练的单人姿态估计模型的网络参数;其中,依据第三预设数据集中的目标关键点标签信息对待训练的单人姿态估计模型的网络参数进行训练,通过前向传播和后向传播算法迭代的更新待训练的单人姿态估计模型的网络参数包括:依据预设宽高比对输入的单人图像的高度或宽度进行扩展,并将单人图像裁剪为预设尺寸。Optionally, in step S506, training the network parameters of the single-person pose estimation model to be trained according to the target key point label information in the third preset data set, and obtaining the trained single-person pose estimation model includes: according to the third preset The target key point label information in the data set is trained on the network parameters of the single-person pose estimation model to be trained, and the network parameters of the single-person pose estimation model to be trained are iteratively updated through forward propagation and backward propagation algorithms; among them, according to the first The target key point label information in the three preset data sets is trained for the network parameters of the single-person pose estimation model to be trained, and the network parameters of the single-person pose estimation model to be trained are iteratively updated through forward propagation and backward propagation algorithms including: Expand the height or width of the input single-person image according to the preset aspect ratio, and crop the single-person image to a preset size.
具体的,仍以HRNet模型为例,HRNet单人姿态估计网络输入为单人图像,输出为该单人图像中人体骨骼关键点的二维坐标;HRNet单人姿态估计网络的结构图如图7所示,分四个阶段,从第二阶段开始,每个阶段向下并行分出一条子网络,其分辨率相比上一级网络减小一半,宽度(通道数C)增加到两倍,因此,到最后第四阶段时具有四个并行的子网络;同时,在每个阶段中又包含了若干个交换块(第一阶段没有),每一个交换块在一个分支上包含一个基本单元(由4个WeightNet残差单元组成,每个WeightNet残差单元如图2所示)以及跨越分辨率的交换单元;其中,交换单元的作用为将当前每个平行子网络的输出通过上采样、下采样或恒等映射操作,将不同分支的分辨率进行融合作为该分支的下一步输入,以达到模型多尺度融合的效果;具体的,第一个阶段包含一个基本单元和一个3×3的卷积层,3×3卷积层的主要作用是将基本单元输出的特征图通道降低为32,作为接下来的高分辨率分支;第2,第3,第4阶段分别包含1,4,3个交换块;由此可知,HRNet总共有8个交换块,进行了8次多尺度融合,在最后的阶段中,每个分支通道数分别为32,64,128,256。Specifically, taking the HRNet model as an example, the input of the HRNet single-person pose estimation network is a single-person image, and the output is the two-dimensional coordinates of the key points of the human skeleton in the single-person image; the structure diagram of the HRNet single-person pose estimation network is shown in Figure 7 As shown, there are four stages. Starting from the second stage, each stage is divided into a sub-network in parallel, and its resolution is reduced by half compared with the previous network, and the width (the number of channels C) is doubled. Therefore, there are four parallel sub-networks in the final fourth stage; at the same time, each stage contains several exchange blocks (not in the first stage), and each exchange block contains a basic unit on a branch ( It consists of 4 WeightNet residual units, each WeightNet residual unit is shown in Figure 2) and a switching unit that spans the resolution; among them, the function of the switching unit is to pass the current output of each parallel sub-network through up-sampling and down-sampling. Sampling or identity mapping operation, the resolution of different branches is merged as the next input of the branch to achieve the effect of multi-scale fusion of the model; specifically, the first stage includes a basic unit and a 3×3 volume The main function of the 3×3 convolutional layer is to reduce the feature map channel output by the basic unit to 32 as the next high-resolution branch; the second, third, and fourth stages include 1, 4, and 3 respectively It can be seen that there are a total of 8 exchange blocks in HRNet, and 8 multi-scale fusions are performed. In the final stage, the number of each branch channel is 32, 64, 128, and 256, respectively.
利用COCO和Kinetics-14数据集中的目标关键点标签信息对HRNet网络参数进行训练,通过前向传播和后向传播算法迭代的更新网络参数;其中,HRNet网络将输入的单人图像的高度或宽度扩展为固定的宽高比(高度比宽度等于4:3),然后将图像裁剪为固定大小384×288;数据增强(预处理)包括随机旋转(±45度),随机缩放(0.65~1.35)和/或随机水平翻转;训练时使用Adam优化器,基础学习率设定为0.001,批尺寸设置为16,并分别在第170和第200个epoch时下降到0.0001和0.00001。总的训练 epoch设置为210。Use the target key point label information in the COCO and Kinetics-14 data sets to train the HRNet network parameters, and update the network parameters iteratively through the forward propagation and backward propagation algorithms; among them, the HRNet network will input the height or width of the single-person image Expand to a fixed aspect ratio (height to width equals 4:3), and then crop the image to a fixed size of 384×288; data enhancement (preprocessing) includes random rotation (±45 degrees), random scaling (0.65~1.35) And/or random level flip; Adam optimizer is used during training, the basic learning rate is set to 0.001, the batch size is set to 16, and it drops to 0.0001 and 0.00001 at the 170th and 200th epochs, respectively. The total training epoch is set to 210.
综上,本申请实施例提供的目标检测模型和单人姿态估计模型均采用前向传播算法获取模型预测输出和真实标签的均方误差(如公式(1),In summary, the target detection model and the single-person pose estimation model provided in the embodiments of the present application both use the forward propagation algorithm to obtain the model prediction output and the mean square error of the true label (as in formula (1),
Figure PCTCN2020117226-appb-000001
Figure PCTCN2020117226-appb-000001
其中,y i为模型对第i个数据的预测,y′ i为第i个数据的真实标签,n为批尺寸值;) Among them, y i is the model's prediction for the i-th data, y′ i is the true label of the i-th data, and n is the batch size value;)
并通过反向传播算法更新模型参数,通过有限次的迭代,使训练模型在训练数据集上的均方误差最小/收敛(训练模型时,当训练准确率及误差不随训练迭代步骤改变,趋于稳定时,称模型已经收敛,并达到误差最小化);并通过验证集筛选出最优的模型作为测试阶段的检测模型(在训练时,每隔一定训练间隔就使用验证集测试一次模型,最终选取在验证集上准确率最高或误差最小的模型)。And through the back-propagation algorithm to update the model parameters, through a limited number of iterations, the mean square error of the training model on the training data set is minimized/converged (when training the model, when the training accuracy and error do not change with the training iteration steps, it tends to When it is stable, it is said that the model has converged and the error is minimized); and the optimal model is selected through the verification set as the detection model in the test phase (during training, the verification set is used to test the model every certain training interval, and finally Select the model with the highest accuracy or the smallest error on the validation set).
可选的,对待训练的单人姿态估计模型的网络参数进行训练中使用的方法包括实施例1中的数据处理方法。Optionally, the method used in training the network parameters of the single-person pose estimation model to be trained includes the data processing method in Embodiment 1.
可选的,本申请实施例提供的数据训练方法还包括:收集训练待训练的目标检测模型和待训练的单人姿态估计模型所需的样本;对样本进行预处理,其中,预处理包括:数据集的划分和预处理操作;对待训练的权重分类模型进行训练,得到收敛的权重分类模型包括:将第一预设数据集中的数据输入待训练的权重分类模型,得到类别预测结果;依据类别预测结果与第一预测数据集中的数据的标签类别,得到类别预测结果与第一预测数据集中的数据的标签类别的误差;依据误差进行反向传播算法训练待训练的权重分类模型,直至待训练的权重分类模型收敛,得到收敛的权重分类模型。Optionally, the data training method provided in this embodiment of the application further includes: collecting samples required for training the target detection model to be trained and the single-person pose estimation model to be trained; preprocessing the samples, where the preprocessing includes: Data set division and preprocessing operations; training the weight classification model to be trained to obtain a convergent weight classification model includes: inputting the data in the first preset data set into the weight classification model to be trained to obtain the category prediction result; according to the category The prediction result and the label category of the data in the first prediction data set are obtained, and the error between the category prediction result and the label category of the data in the first prediction data set is obtained; according to the error, the backpropagation algorithm is used to train the weight classification model to be trained until it is to be trained The weight classification model converges to obtain a convergent weight classification model.
具体的,本申请实施例中的样本可以源于开源数据集,例如:Microsoft COCO 2017 Keypoint Detection Dataset(微软COCO 2017关键点检测数据集)、Kinetics-600和ImageNet(Large Scale Visual Recognition Challenge);Specifically, the samples in the embodiments of this application may be derived from open source data sets, such as: Microsoft COCO 2017 Keypoint Detection Dataset (Microsoft COCO 2017 Keypoint Detection Dataset), Kinetics-600 and ImageNet (Large Scale Visual Recognition Challenge);
其中,本申请实施例中的预处理包括的数据集的划分和预处理操作,其中,数据集的划分是数据输入到模型前,对数据进行处理的步骤,其中,对上述三个数据集依据预设方式进行数据划分,以便筛选得到最优的数据模型。Among them, the preprocessing in the embodiment of the present application includes the division of data sets and preprocessing operations, where the division of data sets is a step of processing data before inputting the data into the model, wherein the above three data sets are based on The data is divided in a preset way, so that the optimal data model can be obtained by screening.
预处理操作包括混合操作和随机几何变换,在输入为图片的情况下,通过对不同图片的合成获得新的训练数据,依据该训练数据对图片作几何变换,以使得由于在多人运动的场景中,出现人物遮挡是常见的,通过预处理操作丰富了训练数据的多样性, 使模型更加鲁棒,能够有效降低对抗图像的影响。The preprocessing operations include mixing operations and random geometric transformations. When the input is a picture, new training data is obtained by synthesizing different pictures, and the picture is geometrically transformed according to the training data, so that the Among them, it is common for people to be occluded. The preprocessing operation enriches the diversity of training data, makes the model more robust, and can effectively reduce the impact of confronting images.
进一步地,可选的,第一预设数据集包括:第一类图像数据集,第一类图像数据集自定义了训练集和验证集;第二预设数据集包括第二类图像数据集和第三类图像数据集中有位置框信息标注的数据集合;第二类图像数据集自定义了训练集和验证集;第三类图像数据集按照预设比例随机划分为训练集和验证集;第二类图像数据集的训练集和第三类图像数据集的训练集为第二预设数据集中的训练集,第二类图像数据集的验证集和第三类图像数据集的验证集为第二预设数据集中的验证集;第三预设数据集包括第二类图像数据集和第三类图像数据集中有关键点信息标注的数据集合;预处理操作包括:通过随机几何变换对第一预设数据集和第三预设数据集中的数据分别进行处理;通过随机混合操作和/或随机几何变换对第二预设数据集中的数据进行处理。Further, optionally, the first preset data set includes: a first type of image data set, the first type of image data set defines a training set and a validation set; the second preset data set includes a second type of image data set And the third type of image data set has a data set labeled with position box information; the second type of image data set has customized training set and verification set; the third type of image data set is randomly divided into training set and verification set according to the preset ratio; The training set of the second type of image data set and the training set of the third type of image data set are the training set of the second preset data set, the validation set of the second type of image data set and the validation set of the third type of image data set are The verification set in the second preset data set; the third preset data set includes the second type image data set and the third type image data set labeled with key point information; the preprocessing operation includes: The data in one preset data set and the third preset data set are processed separately; the data in the second preset data set is processed through random mixing operation and/or random geometric transformation.
具体的,第一预设数据集包括第一类图像数据集,其中,在本申请实施例中第一类图像数据集可以以ImageNet数据集为例进行说明;本申请实施例中的第二预设数据集包括的第二类图像数据集可以以Microsoft COCO 2017 Keypoint Detection Dataset(微软COCO 2017关键点检测数据集)(后续简称COCO数据集)中的位置框信息标注的数据集合为例进行说明,第二预设数据集包括的第三类图像数据集可以以Kinetics-14中的位置框信息标注的数据集合为例进行说明;第三预设数据集包括的第二类图像数据集和第三类图像数据集中有关键点信息标注的数据集合可以以COCO数据集中有关键点信息标注的数据集合和Kinetics-14中有关键点信息标注的数据集合为例进行说明。Specifically, the first preset data set includes a first type of image data set. In this embodiment of the application, the first type of image data set can be described by taking the ImageNet data set as an example; Assuming that the second type of image data set included in the data set can be illustrated by taking the data set labeled by the position box information in the Microsoft COCO 2017 Keypoint Detection Dataset (hereinafter referred to as the COCO data set) as an example, The third type of image data set included in the second preset data set can be illustrated by taking the data set labeled with position frame information in Kinetics-14 as an example; the second type of image data set included in the third preset data set and the third type of image data set The data set labeled with key point information in the image-like data set can be illustrated by taking the data set labeled with key point information in the COCO data set and the data set labeled with key point information in Kinetics-14 as examples.
其中,COCO数据集包含超过200,000张图像和总共250,000个已标注二维关键点信息的数据(此数据集中,人物在图片中的尺度多数为中等尺度和大尺度),可公开下载的训练集和验证集的标注总共超过150,000人和170万个标注关键点。标注信息主要记录在相应的.json格式文件中,其中记录了每张图片的详细信息,包括:图片下载的URL、图片名、图片分辨率、图片采集的时间、图片的索引(ID)、图片中每个人物的可见骨骼关键点的数目(COCO数据集完整的标注个数为17个骨骼关键点,即,图8是根据本发明实施例的数据训练方法中关键点位置和骨架连线的示意图,如图8所示,下标从0开始算起,分别为:0:鼻子:1左眼,2:右眼,3:左耳,4:右耳,5:左肩,6:右肩,7:左肘,8:右肘,9:左手腕,10:右手腕,11:左髋关节,12:右髋关节,13:左膝盖,14:右膝盖,15:左脚踝,16:右脚踝,17:左肩和右肩连线的中点,因为在图片中,有的人物为侧身站立或身体部位被遮挡,所以此信息只记录可见的骨骼关键点数目)、骨骼关键点的坐标(分别按顺序排列,如果某个骨骼位置没有可见的关键点,则相应位置(x,y)设为(0,0))、每个人物的矩形位置框坐标(左上角坐标和右下角坐标)、类别名(COCO数据集大概有80类,但是只有人物才有骨骼 关键点的标注信息)、图像分割信息等等。Among them, the COCO data set contains more than 200,000 images and a total of 250,000 data that has been labeled with two-dimensional key point information (in this data set, the scales of the characters in the pictures are mostly medium-scale and large-scale), and the training set and The validation set has a total of more than 150,000 people and 1.7 million labeled key points. The annotation information is mainly recorded in the corresponding .json format file, which records the detailed information of each picture, including: the URL of the picture download, the picture name, the picture resolution, the time when the picture was collected, the index (ID) of the picture, and the picture The number of visible bone key points of each character in the COCO data set (the number of complete annotations in the COCO data set is 17 bone key points, that is, Figure 8 is the connection between the key point position and the skeleton in the data training method according to the embodiment of the present invention The schematic diagram, as shown in Figure 8, the subscripts start from 0, respectively: 0: nose: 1 left eye, 2: right eye, 3: left ear, 4: right ear, 5: left shoulder, 6: right shoulder , 7: left elbow, 8: right elbow, 9: left wrist, 10: right wrist, 11: left hip, 12: right hip, 13: left knee, 14: right knee, 15: left ankle, 16: Right ankle, 17: The midpoint of the line connecting the left and right shoulders, because in the picture, some characters stand sideways or body parts are blocked, so this information only records the number of visible bone key points), the coordinates of the bone key points (Arranged in order, if there is no visible key point at a certain bone position, the corresponding position (x,y) is set to (0,0)), the rectangular position frame coordinates of each character (the upper left corner coordinates and the lower right corner coordinates ), category name (the COCO data set has about 80 categories, but only characters have the annotation information of bone key points), image segmentation information, and so on.
需要说明的是图8中的左图为COCO数据集的关键点位置和骨架连线示意图;图8中的右图为本申请实施例提供的数据训练方法中基于COCO数据集二获得的关键点位置和骨架连线示意图It should be noted that the left picture in FIG. 8 is a schematic diagram of the key point positions and skeleton connections of the COCO data set; the right picture in FIG. 8 is the key point obtained based on the COCO data set 2 in the data training method provided by the embodiment of the application Location and skeleton connection diagram
其中,图9a和图9b是根据本发明实施例的数据训练方法中关键点位置和骨架连线的标注前和标注后的效果示意图,如图9a和图9b所示,标注流程为:通过标注工具,手动的在每张图片上对特定可见的17个点进行标注,左边为原始图片,右边为标注后可视化的效果图。9a and 9b are schematic diagrams of the pre- and post-labeling effect of the key point positions and the skeleton connection in the data training method according to the embodiment of the present invention. As shown in Figs. 9a and 9b, the labeling process is as follows: Tool, manually mark the specific visible 17 points on each picture, the left side is the original picture, and the right side is the visualized effect picture after labeling.
由于现有的人体检测模型和姿态估计模型主要由自然场景下的图像训练所得,对于运动场景下的目标检测和姿态估计效果差;这是因为运动场景中人物的身体姿势和自然场景中区别较大,而在大部分开源的数据集中,各种运动场景下的人物位置标注和姿态估计标注数据又相对较少,导致现有的目标检测模型和姿态估计模型对运动场景下的人物检测和姿态估计效果差;Since the existing human body detection models and pose estimation models are mainly obtained from image training in natural scenes, they have poor results for target detection and pose estimation in sports scenes; this is because the body poses of people in sports scenes are more different from those in natural scenes. However, in most of the open source data sets, there are relatively few character position annotations and pose estimation annotation data in various sports scenes, resulting in the existing target detection models and pose estimation models for the detection and pose of people in sports scenes. Poor estimated effect;
针对此问题,本申请实施例额外从Kinetics-600开源数据集中收集了14类运动类别,包括:卧推、挺举、爬绳、硬拉、弓步、拳击、跑步、仰卧起坐、跳绳、深蹲和伸展腿,总数超过10000张运动场景下的图片,并使用开源软件Visipedia Annotation Toolkit(一种图像关键点标注工具)对其进行标注,标注格式和COCO数据集一致,在本申请实施例中称其为Kinetics-14;基于Kinetics-14(即,本申请实施例中的第三类图像数据集)及COCO数据集(即,本申请实施例中的第二类图像数据集)中的目标位置框标签信息和目标关键点标签信息,分别训练目标检测模型及单人姿态估计模型,以提高最终整体架构对类似场景中人物位置的定位和骨骼关键点的识别效果。其中,Kinetics-14及COCO数据集中的目标位置框标签信息组成的数据集为本申请实施例中的第二预设数据集,Kinetics-14及COCO数据集中的目标关键点标签信息组成的数据集为本申请实施例中的第三预设数据集。In response to this problem, the embodiment of the application collects 14 additional sports categories from the Kinetics-600 open source data set, including: bench press, clean and jerk, rope climbing, deadlift, lunge, boxing, running, sit-ups, rope skipping, deep Squatting and stretching legs, a total of more than 10,000 pictures in sports scenes, and use the open source software Visipedia Annotation Toolkit (an image key point annotation tool) to mark them, the annotation format is the same as the COCO data set, in the embodiment of this application Call it Kinetics-14; based on the target in Kinetics-14 (that is, the third type of image data set in the embodiment of this application) and the COCO data set (that is, the second type of image data set in the embodiment of this application) The position frame label information and the target key point label information are used to train the target detection model and the single-person pose estimation model respectively to improve the final overall framework's recognition of the position of the person in the similar scene and the recognition of the key points of the skeleton. Among them, the data set composed of the target location frame label information in the Kinetics-14 and the COCO data set is the second preset data set in the embodiment of the application, and the data set composed of the target key point label information in the Kinetics-14 and the COCO data set This is the third preset data set in the embodiment of this application.
使用百万级ImageNet分类数据预训练WeightNet,并将最终收敛的权重作为人物检测模型中特征提取模块的初始化权重,以此提高最终人物检测模型的准确率及加快模型训练的收敛速度。Use millions of ImageNet classification data to pre-train WeightNet, and use the final convergence weight as the initial weight of the feature extraction module in the person detection model, so as to improve the accuracy of the final person detection model and accelerate the convergence speed of model training.
这里第一预设数据集在输入上述数据模型进行数据训练的过程中,需要进行数据的随机几何变换;第二预设数据集在输入上述数据模型进行数据训练的过程中,需要 进行数据混合操作和随机几何变换;第三预设数据集在输入上述数据模型进行数据训练的过程中,需要进行数据的随机几何变换。Here, the first preset data set needs to perform random geometric transformation of the data during the process of inputting the above data model for data training; the second preset data set needs to perform data mixing operations during the process of inputting the above data model for data training And random geometric transformation; the third preset data set needs to perform random geometric transformation of the data during the process of inputting the above-mentioned data model for data training.
可选的,通过随机几何变换包括随机裁剪、按预设角度进行随机旋转和/或按照预设缩放比例进行随机缩放;随机混合操作包括将至少两个数据按照预设权重进行重合,具体为将不同数据中的预设位置像素值与预设权重的乘积相加。Optionally, the random geometric transformation includes random cropping, random rotation according to a preset angle, and/or random scaling according to a preset zoom ratio; the random mixing operation includes superimposing at least two data according to preset weights, specifically The product of the preset position pixel value in different data and the preset weight is added.
具体的,本申请实施例中随机混合操作表示为:mix-up操作,其中,图10是根据本发明实施例的数据识别方法中mix-up的效果示意图,如图10所示,mix-up的操作流程具体如下:Specifically, the random mixing operation in the embodiment of the present application is expressed as: a mix-up operation, where FIG. 10 is a schematic diagram of the effect of mix-up in the data recognition method according to an embodiment of the present invention, as shown in FIG. 10, mix-up The operation process is as follows:
将两张输入图像按照一定权重合并成一张新图像,合并之后的图像作为新的输入训练数据;由于目标检测模型对图像几何变换有很强的敏感性,所以当进行mix-up操作时的两张图像分辨率不一致时,将采取几何保持对齐,以避免出现图像失真,也就是不对图像进行修剪和缩放,直接采用相应位置像素值乘于一定权重再相加,具体表达式为公式(2)。通过mix-up操作,由于在多人运动的场景中,出现人物遮挡是常见的,此操作作为一种数据扩展方式,丰富了训练数据的多样性,使模型更加鲁棒,能够有效降低对抗图像的影响。需要说明的是,采用本方案执行后,还需对mix-up操作后的图像进行归一化处理,以使最终图像的各通道像素值仍在0~255的范围之内。The two input images are merged into a new image according to a certain weight, and the merged image is used as the new input training data; since the target detection model is very sensitive to image geometric transformation, the two input images are used when performing mix-up operations. When the resolutions of the images are inconsistent, the geometric alignment will be used to avoid image distortion, that is, the image is not trimmed and zoomed, and the pixel value of the corresponding position is directly multiplied by a certain weight and then added. The specific expression is formula (2) . Through the mix-up operation, since it is common for people to occlude in the scene of multi-person movement, this operation is used as a data expansion method, which enriches the diversity of training data, makes the model more robust, and can effectively reduce the confrontation image Impact. It should be noted that after the implementation of this solution, the image after the mix-up operation needs to be normalized, so that the pixel value of each channel of the final image is still within the range of 0-255.
Figure PCTCN2020117226-appb-000002
Figure PCTCN2020117226-appb-000002
其中,x i和x j表示两张不同的图像,
Figure PCTCN2020117226-appb-000003
表示通过mix-up操作合成的图像,α和β表示mix-up的权重,本申请实施例中α和β的取值范围没有限制(例如:在分类任务中,优选0<α+β<1,在目标检测任务中,优选α和β>1,又例如:在分类任务中,0.2<α:β<0.4,在目标检测任务中,0.8<α:β<1.2),优选的,本申请实施例中设置α=β=1.5。
Among them, x i and x j represent two different images,
Figure PCTCN2020117226-appb-000003
Represents the image synthesized by the mix-up operation, α and β represent the weight of the mix-up, the value range of α and β in the embodiment of this application is not limited (for example: in the classification task, preferably 0<α+β<1 In the target detection task, preferably α and β>1, another example: in the classification task, 0.2<α: β<0.4, in the target detection task, 0.8<α: β<1.2), preferably, this application In the embodiment, α=β=1.5 is set.
需要说明的是,针对前述的mix-up操作流程,还可以用下述方法替代:It should be noted that for the aforementioned mix-up operation process, the following methods can also be used instead:
将两张输入图像按照一定权重合并成一张新图像,合并之后的图像作为新的输入训练数据;由于目标检测模型对图像几何变换有很强的敏感性,所以当进行mix-up操作时的两张图像分辨率不一致时,将采取几何保持对齐,以避免出现图像失真,也就是不对图像进行修剪和缩放,直接采用相应位置像素值乘于一定权重再相加,具体表达式为公式(3)。通过mix-up操作,由于在多人运动的场景中,出现人物遮挡是常见的,此操作作为一种数据扩展方式,丰富了训练数据的多样性,使模型更加鲁棒, 能够有效降低对抗图像的影响。The two input images are merged into a new image according to a certain weight, and the merged image is used as the new input training data; since the target detection model is very sensitive to image geometric transformation, the two input images are used when performing mix-up operations. When the resolutions of the images are inconsistent, geometric alignment will be used to avoid image distortion, that is, the image is not trimmed and zoomed, and the pixel value of the corresponding position is directly multiplied by a certain weight and then added. The specific expression is formula (3) . Through the mix-up operation, since it is common for people to occlude in the scene of multi-person movement, this operation is a data expansion method, which enriches the diversity of training data, makes the model more robust, and can effectively reduce the confrontation image Impact.
Figure PCTCN2020117226-appb-000004
Figure PCTCN2020117226-appb-000004
其中,x i和x j表示两张不同的图像,
Figure PCTCN2020117226-appb-000005
表示通过mix-up操作合成的图像,λ表示mix-up的权重,且针对每个
Figure PCTCN2020117226-appb-000006
其λ为从一个贝塔随机分布中随机抽样得到,表达为公式(4)。本申请实施例中λ的取值范围没有限制(在分类任务中,0<λ<1,在目标检测任务中,λ>1),优选的,本申请实施例中设置λ=1.5。
Among them, x i and x j represent two different images,
Figure PCTCN2020117226-appb-000005
Represents the image synthesized by the mix-up operation, λ represents the weight of the mix-up, and for each
Figure PCTCN2020117226-appb-000006
The λ is randomly sampled from a beta random distribution, expressed as formula (4). The value range of λ in the embodiment of this application is not limited (in the classification task, 0<λ<1, in the target detection task, λ>1). Preferably, λ=1.5 is set in the embodiment of this application.
λ~Beta(α,α)           (4)λ~Beta(α,α) (4)
此外,在本申请实施例中随机几何变换包括随机裁剪(256x256,其中:可以有多种裁剪大小,考虑到训练硬件条件,一般设置为2的N次方,且最短边不小于128,最大边不大于512),随机在(-45°,45°)范围内旋转(即,本申请实施例中的预设旋转角度),50%的概率随机水平翻转和随机在(0.65,1.35)范围内缩放。其中,随机裁剪表示将原始图片的大小随机裁剪为256x256(本申请实施例采用的剪裁尺寸),而通道大小不变;随机旋转操作表示在正负45度内随机旋转图像角度,改变图像内容的朝向;随机翻转操作表示以50%的概率随机水平的翻转图像;随机缩放操作表示在0.65~1.35的比例内放大或者缩小图像;通过随机几何变换,在训练分类网络和姿态估计网络时,随机几何变换不仅仅只是增加数据,也是一个弱化数据噪声与增加模型稳定性的方法。In addition, in the embodiments of this application, random geometric transformation includes random cropping (256x256, among which: there can be multiple cropping sizes. Considering the training hardware conditions, it is generally set to the Nth power of 2, and the shortest side is not less than 128, and the largest side Not greater than 512), randomly rotating within the range of (-45°, 45°) (ie, the preset rotation angle in the embodiment of this application), with a 50% probability of random horizontal flipping and randomly within the range of (0.65, 1.35) Zoom. Among them, random cropping means that the size of the original picture is randomly cropped to 256x256 (the cropping size used in the embodiment of this application), and the channel size is unchanged; the random rotation operation means that the image angle is randomly rotated within plus or minus 45 degrees to change the image content. Orientation; random flip operation means to flip the image at a random level with a probability of 50%; random zoom operation means to enlarge or reduce the image within a ratio of 0.65 to 1.35; through random geometric transformation, when training the classification network and the pose estimation network, the random geometry Transformation is not only to increase data, but also a method to reduce data noise and increase model stability.
需要说明的是,在本申请实施例中随机几何变换中可以包括随机裁剪、按预设角度进行随机旋转和/或按照预设缩放比例进行随机缩放中的一种或至少两种的组合,且执行顺序上根据图片实际需要进行调整,例如,有的图片的尺寸刚好符合数据训练,则不需要进行随机剪裁或缩放;或,图片的展示角度刚好符合数据训练,则不需要进行随机旋转。同理,根据实际对图片的需求对图片进行随机几何变换。It should be noted that the random geometric transformation in the embodiments of the present application may include one or a combination of at least two of random cropping, random rotation according to a preset angle, and/or random scaling according to a preset zoom ratio, and The execution sequence is adjusted according to the actual needs of the pictures. For example, if the size of some pictures just meets the data training, random cropping or scaling is not required; or, if the display angle of the pictures just meets the data training, random rotation is not required. In the same way, random geometric transformation is performed on the picture according to the actual demand for the picture.
其中,本申请实施例中的预处理操作是在模型训练时(每一轮),将原来用于训练的数据中的部分按照上述方式进行预处理,然后这些预处理后的数据一并用于训练;不同轮之间选择的数据,和预处理后实际进行训练的数据都是不一样的,以达到逐步收敛的效果。Among them, the preprocessing operation in the embodiment of this application is to preprocess the part of the data originally used for training in the above-mentioned manner during model training (each round), and then these preprocessed data are used for training. ; The data selected between different rounds is different from the actual training data after preprocessing, in order to achieve the effect of gradual convergence.
实施例四Example four
根据本发明实施例的一个方面,提供了一种数据识别方法,基于上述实施例三中的方法,图11是根据本发明实施例的数据识别方法的流程示意图,如图11所示,包括:According to one aspect of the embodiment of the present invention, a data identification method is provided. Based on the method in the third embodiment above, FIG. 11 is a schematic flowchart of a data identification method according to an embodiment of the present invention, as shown in FIG. 11, including:
步骤S1102,将待识别的特征数据输入权重注意力神经网络模型,识别得到待识别的特征数据中至少一个目标的关键点二维坐标,其中,权重注意力神经网络模型用于通过自顶向下的方式进行至少一人的姿态估计,检测待识别的特征数据中至少一个目标的位置矩形框,并检测位置矩形框内目标的关键点二维坐标;Step S1102, input the feature data to be recognized into the weighted attention neural network model, and identify the two-dimensional coordinates of the key points of at least one target in the feature data to be recognized. The weighted attention neural network model is used for top-down At least one person’s posture is estimated in a manner of, the position rectangle of at least one target in the feature data to be recognized is detected, and the two-dimensional coordinates of key points of the target in the position rectangle are detected;
步骤S1104,通过目标的关键点二维坐标进行计算,得到第一预设关键点组合的连线与第二预设关键点组合的连线之间的夹角或第一预设关键点组合的连线与第一预设线之间的夹角;Step S1104: Calculate the two-dimensional coordinates of the key points of the target to obtain the angle between the line of the first preset key point combination and the line of the second preset key point combination or the first preset key point combination The angle between the connection and the first preset line;
其中,第一预设线可为水平线或竖直线等;第一预设关键点组合中有两个关键点;第二预设关键点组合中有两个关键点。Among them, the first preset line can be a horizontal line or a vertical line, etc.; there are two key points in the first preset key point combination; and there are two key points in the second preset key point combination.
具体的,在本申请实施例中第一预设关键点组合的连线与第二预设关键点组合的连线之间的夹角或第一预设关键点组合的连线与第一预设线之间的夹角具体如下:Specifically, in the embodiment of the present application, the angle between the line of the first preset key point combination and the line of the second preset key point combination or the line of the first preset key point combination and the first preset The angle between the lines is as follows:
情形1:特定3个关键点中的特定两条线之间的夹角;Scenario 1: The included angle between specific two lines of specific three key points;
其中,假设在平面内有3个不在同一直线上的关键点,根据两两组合,即,关键点1与关键点2连线的线段,关键点1与关键点3连线的线段,通过关键点1相连形成的夹角。Among them, assuming that there are 3 key points that are not on the same straight line in the plane, according to the pairwise combination, that is, the line segment connecting key point 1 and key point 2, and the line segment connecting key point 1 and key point 3, pass the key The angle formed by connecting points 1.
情形2:特定2个关键点的连线与环境线(例如水平线或竖直线,即本申请实施例中的第一预设线)之间的夹角;Scenario 2: The angle between the connection of two specific key points and the environmental line (for example, a horizontal line or a vertical line, that is, the first preset line in the embodiment of the present application);
其中,假定得到的2个关键点,是位于人体目标的肩部的两个关键点,为了与人体其他关键点构成骨架连线,需要进行线段连线,因此在保障没有冗余连接的情况下,通过与水平线或垂线的连接,构成夹角。Among them, it is assumed that the two key points obtained are the two key points located at the shoulder of the human body target. In order to form a skeleton connection with other key points of the human body, a line segment connection is required. Therefore, when there is no redundant connection, , Through the connection with the horizontal line or the vertical line to form an angle.
情形3:特定2个关键点连线与另2个关键点连线之间的夹角;Scenario 3: The angle between the connection of two specific key points and the connection of the other two key points;
其中,与情形1相似,基于通过得到的关键点二维坐标,分别获取两组由两个关键点的连线,获取两条连线的夹角。Among them, similar to case 1, based on the two-dimensional coordinates of the key points obtained through the pass, two sets of lines from the two key points are obtained respectively, and the angle between the two lines is obtained.
步骤S1106,将第一预设关键点组合的连线与第二预设关键点组合的连线之间的夹角或第一预设关键点组合的连线与第一预设线之间的夹角在第一预设数据库中进行匹配,得出目标的识别结果。Step S1106, the angle between the line of the first preset key point combination and the line of the second preset key point combination or the line between the first preset key point combination and the first preset line The included angle is matched in the first preset database to obtain the recognition result of the target.
具体的,结合步骤S1102至步骤S1106,图12是根据本发明实施例的数据识别方法中基于深度学习得到的体态风险的评估的流程示意图,如图12所示,在本申请实施例中特征数据可以包括:图片和/或视频,即,在本申请实施例中,特征数据的输入形 式可以包括:形式一:图片;形式二:视频;形式三:图片和视频。Specifically, in combination with step S1102 to step S1106, FIG. 12 is a schematic diagram of the evaluation process of the posture risk based on deep learning in the data recognition method according to the embodiment of the present invention. As shown in FIG. 12, the characteristic data in the embodiment of the present application It may include: pictures and/or videos, that is, in the embodiment of the present application, the input form of the feature data may include: form one: picture; form two: video; form three: picture and video.
其中,本申请实施例提供的数据识别方法在将特征数据输入到端到端的模型之前还包括数据样本的采集、神经网络的学习,如图12所示,本申请实施例提供的体态风险的评估方法具体如下:Among them, the data recognition method provided by the embodiment of the present application also includes data sample collection and neural network learning before inputting the characteristic data into the end-to-end model. As shown in FIG. 12, the posture risk assessment provided by the embodiment of the present application The method is as follows:
Step1:数据采集,根据获取的数据集,进行样本采集;Step1: Data collection, according to the acquired data set, sample collection;
Step2:基于Step1的样本采集,对数据集中的数据进行预处理,分别得到训练集和测试集;Step2: Based on the sample collection of Step1, preprocess the data in the data set to obtain the training set and the test set respectively;
Step3:输入特征数据至端到端的模型,得到目标的关键点二维坐标;Step3: Input feature data to the end-to-end model to obtain the two-dimensional coordinates of the key points of the target;
Step4:根据特征数据的数据类型,依据目标的关键点二维坐标进行角度计算,生成体态风险的评估结果。Step4: According to the data type of the characteristic data, the angle is calculated according to the two-dimensional coordinates of the key points of the target, and the assessment result of the posture risk is generated.
这里本申请实施例提供的数据识别方法中将待评估的图像输入到端到端的模型,输出为模型识别到的人体骨骼关键点二维坐标(即,本申请实施例中的目标的关键点二维坐标),通过该骨骼关键点二维坐标进而计算出特定个数关节的角度值,并且通过该角度值将该夹角的与第一预设数据库中的夹角进行匹配,得到各个夹角的对应的位置,从而生成如图8右侧图中的关键点组合连线,达到识别图像中目标的目的;进而通过将识别结果在第二预设数据库中进行匹配,将得到体态评估结果;另外,该输入也可是运动视频,通过上述获取频流(帧)中各个运动者的每个关节角度的连续变化曲线信息,并与标准运动库进行比较,进而给出有针对性的运动改进指导。Here, in the data recognition method provided by the embodiment of the application, the image to be evaluated is input to the end-to-end model, and the output is the two-dimensional coordinates of the key points of the human skeleton recognized by the model (ie, the key point two of the target in the embodiment of the application). Dimensional coordinates), calculate the angle value of a specific number of joints through the two-dimensional coordinates of the bone key point, and use the angle value to match the included angle with the included angle in the first preset database to obtain each included angle The corresponding position of, thereby generating the key point combination line in the right figure of Figure 8 to achieve the purpose of identifying the target in the image; and then by matching the recognition result in the second preset database, the posture evaluation result will be obtained; In addition, the input can also be a sports video, through the above-mentioned acquisition of the continuous change curve information of each joint angle of each athlete in the frequency stream (frame), and compare it with the standard sports library, and then provide targeted sports improvement guidance .
此外,当输入为图片和视频的情况下,如图12所示,在多人姿态估计模块中会对图片和视频分别作处理,针对图片为输入的情况下,得到图片中的人体骨骼关键点二维坐标;针对视频为输入的情况下,得到视频中每帧中各个运动者的每个关节角度的连续变化曲线信息,或,从视频中按照预设时间间隔抽取帧图像,并依据抽取的帧图像获取各个运动者的每个关节角度的连续变化曲线信息,通过按照预设时间间隔抽取帧图像,以此降低计算机对图像的识别压力,降低运算量,提升识别效率;分别依据人体骨骼关键点二维坐标和连续变化曲线信息,得到对图片中每个人和视频中每个人的体态风险的评估结果。In addition, when the input is a picture and a video, as shown in Figure 12, the picture and video are processed separately in the multi-person pose estimation module. When the picture is the input, the key points of the human skeleton in the picture are obtained. Two-dimensional coordinates; when the video is input, obtain the continuous change curve information of each joint angle of each athlete in each frame of the video, or extract frame images from the video at a preset time interval, and according to the extracted The frame image acquires the continuous change curve information of each joint angle of each athlete, and extracts frame images at preset time intervals to reduce the pressure on the computer to recognize the image, reduce the amount of calculation, and improve the recognition efficiency; respectively according to the key of human bones Point two-dimensional coordinates and continuous change curve information to obtain the assessment result of the posture risk of each person in the picture and each person in the video.
在本发明实施例中,采用自顶向下的多人姿态估计的方式,通过将待识别的特征数据输入权重注意力神经网络模型,识别得到待识别的特征数据中至少一个目标的关键点二维坐标,其中,权重注意力神经网络模型用于通过自顶向下的方式进行至少一人的姿态估计,检测待识别的特征数据中至少一个目标的位置矩形框,并检测位置矩形框内目标的关键点二维坐标;通过目标的关键点二维坐标进行计算,得到第一预设 关键点组合的连线与第二预设关键点组合的连线之间的夹角或第一预设关键点组合的连线与第一预设线之间的夹角;将第一预设关键点组合的连线与第二预设关键点组合的连线之间的夹角或第一预设关键点组合的连线与第一预设线之间的夹角在第一预设数据库中进行匹配,得出目标的识别结果,达到了提升对人体姿态的识别精度和效率的目的,从而实现了根据提升精度和效率后的人体姿态提供评估结果的技术效果,进而解决了由于相关技术在对人体姿态的识别过程中,数据处理效率低的技术问题。In the embodiment of the present invention, the top-down multi-person pose estimation method is adopted, and the key points of at least one target in the feature data to be recognized are recognized by inputting the feature data to be recognized into the weighted attention neural network model. Dimensional coordinates, where the weighted attention neural network model is used to estimate the pose of at least one person in a top-down manner, detect the position rectangle of at least one target in the feature data to be recognized, and detect the position of the target in the position rectangle Two-dimensional coordinates of key points; calculated by the two-dimensional coordinates of the key points of the target, the angle between the line of the first preset key point combination and the line of the second preset key point combination or the first preset key is obtained The angle between the line of the point combination and the first preset line; the angle between the line of the first preset key point combination and the line of the second preset key point combination or the first preset key The angle between the connection line of the point combination and the first preset line is matched in the first preset database to obtain the recognition result of the target, which achieves the purpose of improving the accuracy and efficiency of the recognition of the human body posture, thereby achieving The technical effect of the evaluation result is provided according to the human body posture after the accuracy and efficiency has been improved, thereby solving the technical problem of low data processing efficiency in the process of recognizing the human body posture due to related technologies.
可选的,步骤S1106中将第一预设关键点组合的连线与第二预设关键点组合的连线之间的夹角或第一预设关键点组合的连线与第一预设线之间的夹角在第一预设数据库中进行匹配,得出目标的识别结果包括:Optionally, in step S1106, the angle between the line of the first preset key point combination and the line of the second preset key point combination or the line of the first preset key point combination and the first preset The angle between the lines is matched in the first preset database, and the recognition result of the target includes:
在待识别的特征数据包括图片数据的情况下,将得到的至少一个夹角的角度值与第一预设数据库中的相应的夹角类型的角度值进行匹配,得出图片数据的识别结果。In the case where the feature data to be recognized includes picture data, the obtained angle value of at least one included angle is matched with the angle value of the corresponding included angle type in the first preset database to obtain the recognition result of the picture data.
可选的,该夹角包括:两眼连线与水平直线的夹角、肩膀连线与水平直线的夹角、胯部连线和水平直线的夹角、头部中线与垂直直线的夹角,躯干中线与垂直直线的夹角、上手臂与下手臂的关节夹角、大腿与小腿的关节夹角、耳朵与肩膀的连线与垂直直线的夹角、躯干中线与大腿中线的关节夹角、上手臂与下手臂的关节夹角和大腿与小腿的关节夹角。Optionally, the included angle includes: the angle between the line between the eyes and the horizontal line, the angle between the shoulder line and the horizontal line, the angle between the crotch line and the horizontal line, the angle between the center line of the head and the vertical line , The angle between the midline of the torso and the vertical straight line, the joint angle between the upper arm and the lower arm, the joint angle between the thigh and the calf, the angle between the line between the ear and the shoulder and the vertical straight line, the joint angle between the midline of the trunk and the midline of the thigh , The joint angle between the upper arm and the lower arm and the joint angle between the thigh and the calf.
具体的,图13a和图13b是根据本发明实施例的数据识别方法中正面照和侧面照的示意图,如图13a和图13b所示,展示了角度计算模块所计算的特定13个关节夹角;其中包括两眼连线与水平直线的夹角(正面照/1)、肩膀连线与水平直线的夹角(正面照/2)、胯部连线和水平直线的夹角(正面照/3)、头部中线与垂直直线的夹角(正面照/1),躯干中线与垂直直线的夹角(正面照/5)、上手臂与下手臂的关节夹角(正面照/左6右7)、大腿与小腿的关节夹角(正面照/左8右9)、耳朵与肩膀的连线与垂直直线的夹角(侧面照/10)、躯干中线与大腿中线的关节夹角(侧面照/11)、上手臂与下手臂的关节夹角(侧面照/12)和大腿与小腿的关节夹角(侧面照/13)。具体的计算流程为:设A、B和C分别为二维平面上的三个点(即,本申请实施例中的特征数据所在的二维平面上获取任意三个点),要求出直线AB与直线AC之间的夹角,可以先求出直线AB,AC的斜率,再转换成对应的角度,两线角度之差即为所求夹角,考虑到夹角的方向,顺时钟夹角定为正。Specifically, FIGS. 13a and 13b are schematic diagrams of front and side shots in the data recognition method according to an embodiment of the present invention. As shown in FIGS. 13a and 13b, they show specific 13 joint angles calculated by the angle calculation module. ; Including the angle between the line between the eyes and the horizontal line (front view/1), the angle between the shoulder line and the horizontal line (front view/2), the angle between the crotch line and the horizontal line (front view/ 3) The angle between the midline of the head and the vertical line (frontal photo/1), the angle between the midline of the torso and the vertical line (frontal photo/5), the joint angle between the upper arm and the lower arm (frontal photo/left 6 right 7), the joint angle between the thigh and the calf (front photo/left 8 right 9), the angle between the line between the ear and the shoulder and the vertical line (side photo/10), the joint angle between the midline of the torso and the midline of the thigh (side Photo/11), the joint angle between the upper arm and the lower arm (side photo/12) and the joint angle between the thigh and the calf (side photo/13). The specific calculation process is as follows: Suppose A, B, and C are three points on a two-dimensional plane (that is, any three points are obtained on the two-dimensional plane where the feature data in the embodiment of the present application is located), and a straight line AB is required. The angle between the line AC and the line AB can be calculated first, and then converted into the corresponding angle. The difference between the angles of the two lines is the angle to be obtained. Considering the direction of the angle, the angle is clockwise. Set as positive.
可选的,步骤S1106中将第一预设关键点组合的连线与第二预设关键点组合的连线之间的夹角或第一预设关键点组合的连线与第一预设线之间的夹角在第一预设数据库中进行匹配,得出目标的识别结果包括:Optionally, in step S1106, the angle between the line of the first preset key point combination and the line of the second preset key point combination or the line of the first preset key point combination and the first preset The angle between the lines is matched in the first preset database, and the recognition result of the target includes:
步骤S11061,在待识别的特征数据包括视频数据的情况下,针对每一帧或指定帧,获取视频数据中各相应帧的至少一个目标的关键点二维坐标信息,其中,指定帧为固定时间间隔帧和/或关键帧;Step S11061, in the case that the feature data to be identified includes video data, for each frame or specified frame, obtain key point two-dimensional coordinate information of at least one target of each corresponding frame in the video data, where the specified frame is a fixed time Interval frames and/or key frames;
其中,每一帧或指定帧在实现时如下:Among them, each frame or specified frame is implemented as follows:
获取视频数据中每一帧的至少一个目标的关键点二维坐标信息,以10秒的实时视频为例,在10秒内,获取视频数据中连续帧(每一帧)的至少一个目标的关键点二维坐标信息;Obtain the two-dimensional coordinate information of the key points of at least one target in each frame of the video data. Take a 10-second real-time video as an example. Within 10 seconds, obtain the key of at least one target in consecutive frames (each frame) in the video data. Point two-dimensional coordinate information;
获取视频数据中指定帧的至少一个目标的关键点二维坐标信息,由于连续帧中往往会存在重复画面,因此为了提升数据处理效率,通过采集预设时间间隔(固定时间间隔)或关键帧的帧画面中的至少一个目标的关键点二维坐标信息,减轻对每帧画面均需进行数据处理的压力。Obtain the two-dimensional coordinate information of the key points of at least one target in the specified frame of the video data. Since there are often repeated images in consecutive frames, in order to improve the efficiency of data processing, the collection of preset time intervals (fixed time intervals) or key frame The two-dimensional coordinate information of the key point of at least one target in the frame picture reduces the pressure of data processing for each frame picture.
其中,关键帧的获取可以通过软件的相关功能标记得到,例如,对于检测到人物、动物的帧画面便认为是关键帧,和/或,存在预设幅度动作变化的帧画面便确定为关键帧;获取视频数据中指定帧的至少一个目标的关键点二维坐标信息可以应用于上传的已拍摄完成的视频数据。Among them, the key frame can be obtained through the relevant function flags of the software. For example, a frame with a detected person or animal is regarded as a key frame, and/or a frame with a preset amplitude motion change is determined as a key frame ; Obtaining two-dimensional coordinate information of key points of at least one target in the specified frame of the video data can be applied to the uploaded video data that has been shot.
此外,在分布式系统下,获取视频数据中指定帧的至少一个目标的关键点二维坐标信息,可以同时在多台具备数据处理能力的计算设备上实现通过固定时间间隔帧和关键帧获取至少一个目标的关键点二维坐标信息。In addition, in a distributed system, acquiring the key point two-dimensional coordinate information of at least one target in a specified frame of video data can be implemented simultaneously on multiple computing devices with data processing capabilities through fixed time interval frames and key frames. Two-dimensional coordinate information of key points of a target.
步骤S11062,依据视频数据中各相应帧的至少一个目标的关键点二维坐标信息得到至少一个目标的至少一个特定夹角的角度时间变化曲线,并通过与至少一种标准运动的至少一个夹角的角度时间变化曲线做比较分析,得到识别结果。Step S11062: Obtain the angle-time variation curve of at least one specific included angle of at least one target according to the two-dimensional coordinate information of the key point of at least one target in each corresponding frame of the video data, and pass the at least one included angle with at least one standard motion Compare and analyze the angle-time variation curve of the angle, and get the recognition result.
进一步地,可选的,依据视频数据中各相应帧的至少一个目标的关键点二维坐标信息得到至少一个目标的至少一个特定夹角的角度时间变化曲线,并通过与至少一种标准运动的至少一个夹角的角度时间变化曲线做比较分析,得到识别结果包括:将至少一个目标的至少一个特定夹角的角度时间变化曲线,与预先获得的至少一种标准运动的至少一个夹角的角度时间变化曲线进行相似度比较,若相似度落入第一预设阈值区间,则判定视频数据中各相应帧的相应目标正在进行所对应的标准运动类型;在判定视频数据中各相应帧的相应目标正在进行所对应的标准运动类型的情况下,进一步比较该目标的至少一个特定夹角的角度时间变化曲线与标准运动的相应特定夹角的角度时间变化曲线;若目标的至少一个特定夹角的角度时间变化曲线上相邻最值的差,和标准运动的相应特定夹角的角度时间变化曲线上相邻最值的差落入第二预设阈值区 间,则判断视频数据中目标的特定夹角所对应的关节动作规范,否则视频数据中各相应帧的该目标的特定夹角所对应的关节动作不规范;判断目标的至少一个特定夹角的角度时间变化曲线上相邻峰值之间的距离,和标准运动的相应特定夹角的角度时间变化曲线上相邻峰值的差是否落入第三预设阈值区间、第四预设阈值区间或第五预设阈值区间,进而确认视频数据中各相应帧的目标的特定夹角所对应的关节动作运动强度过低、适当或过高。Further, optionally, according to the two-dimensional coordinate information of the key point of at least one target in each corresponding frame of the video data, the angle-time variation curve of at least one specific included angle of the at least one target is obtained, and the angle-time variation curve of at least one specific included angle is obtained by comparing with at least one standard motion The comparison and analysis of the angle-time variation curve of at least one included angle to obtain the recognition result includes: comparing the angle-time variation curve of at least one specific included angle of the at least one target with at least one angle of at least one included angle obtained in advance for at least one standard motion The time variation curve is compared for similarity. If the similarity falls within the first preset threshold interval, it is determined that the corresponding target of each corresponding frame in the video data is performing the corresponding standard motion type; the corresponding standard motion type of each corresponding frame in the video data is determined. When the target is performing the corresponding standard exercise type, further compare the angle time change curve of at least one specific included angle of the target with the angle time change curve of the corresponding specific included angle of the standard motion; if the target has at least one specific included angle The difference between the adjacent maximum value on the angle-time variation curve of the standard motion and the adjacent maximum value on the angle-time variation curve of the corresponding specific included angle of the standard motion falls within the second preset threshold interval, and then the specific target in the video data is determined The joint motion specification corresponding to the included angle, otherwise the joint motion corresponding to the specific included angle of the target in each corresponding frame of the video data is not standardized; the angle time variation curve of at least one specific included angle of the target is judged between adjacent peaks Whether the difference between adjacent peaks on the angle-time variation curve of the corresponding specific included angle of the standard motion falls within the third preset threshold interval, the fourth preset threshold interval or the fifth preset threshold interval, and then confirm the video data The motion intensity of the joint action corresponding to the specific included angle of the target in each corresponding frame is too low, appropriate, or too high.
具体的,例如获取视频图像中正在健身的人的手臂变化曲线,由于人在托举、提举杠铃时,手臂各关键点在图像中的坐标会发生改变,因此,根据每个角度值随着时间的变化得到的连线,进而得到角度时间变化曲线,并依据该角度时间变化曲线与相应的标准运动类型的至少一种标准运动的至少一个夹角的角度时间变化曲线做比较分析,得到识别结果。Specifically, for example, to obtain the change curve of the arm of a person who is exercising in a video image, because when the person lifts or lifts the barbell, the coordinates of the key points of the arm in the image will change. Therefore, according to the value of each angle The connection obtained from the change of time, and then the angle-time change curve is obtained, and the angle-time change curve of at least one included angle of the corresponding standard motion type of at least one standard motion is compared and analyzed according to the angle-time change curve, and the identification is obtained result.
其中,以获取正在健身的人的视频为例进行说明,假设视频中的人在托举杠铃,获取该人的各关节的关键点和相关连线组成的夹角随着时间推移的角度时间变化曲线,该角度时间变化曲线可以为视频中每帧图像中各夹角变换得到的角度时间变化曲线;还可以为在预设时间间隔内抽取的帧画面中各夹角变换得到的角度时间变化曲线;Among them, take the video of a person who is exercising as an example for illustration. Suppose the person in the video is lifting a barbell, and obtain the angle and time change of the angle formed by the key points of the person’s joints and related connections over time. Curve, the angle time change curve can be the angle time change curve obtained by the angle transformation in each frame of the image in the video; it can also be the angle time change curve obtained by the angle transformation of the frame images extracted in the preset time interval ;
获取数据库中各类标准运动的至少一个夹角的角度时间变化曲线;Obtain the angle-time variation curve of at least one included angle of various standard motions in the database;
通过将角度时间变化曲线之间的相似度比较,所得到的相似度若落入第一预设阈值区间,则判定视频数据中的人所做的运动为托举杠铃;By comparing the similarity between the angle-time variation curves, if the obtained similarity falls within the first preset threshold interval, it is determined that the exercise performed by the person in the video data is lifting a barbell;
进一步地,通过将该人的角度时间变化曲线与托举杠铃标准的角度时间变化曲线进行比较,通过在两条角度时间曲线分别求相邻最值的差,通过做差的形式判断该人的各特定角度所对应的关节动作是否规范;并进一步的,通过在两条角度时间曲线分别求相邻峰值的差,并判断是否属于第三预设阈值区间、第四预设阈值区间或第五预设阈值区间,进而判断该人的运动强度是否过低、适当或过高。Further, by comparing the person’s angle-time variation curve with the standard angle-time variation curve of the lifting barbell, the difference between the two angle-time curves is found to be the adjacent maximum value, and the person’s status is judged by the difference. Whether the joint motion corresponding to each specific angle is standardized; and further, by calculating the difference between adjacent peaks on the two angle-time curves, and judging whether it belongs to the third preset threshold interval, the fourth preset threshold interval, or the fifth preset threshold interval. The threshold interval is preset to determine whether the person's exercise intensity is too low, appropriate, or too high.
其中,本申请实施例中第一预设阈值区间用于判断视频中目标的运动类型;第二预设阈值区间用于判断视频中的目标的运动姿势是否规范;第三预设阈值区间、第四预设阈值区间或第五预设阈值区间用于判断视频中目标的运动强度;Among them, the first preset threshold interval in the embodiment of the present application is used to determine the movement type of the target in the video; the second preset threshold interval is used to determine whether the movement posture of the target in the video is standardized; the third preset threshold interval, the first The fourth preset threshold interval or the fifth preset threshold interval is used to determine the exercise intensity of the target in the video;
需要说的是第三预设阈值区间、第四预设阈值区间或第五预设阈值区间的设置还可以通过设置一个阈值区间实现,通过每个阈值区间中的子区间设置对应的运动强度。It needs to be said that the setting of the third preset threshold interval, the fourth preset threshold interval or the fifth preset threshold interval can also be realized by setting a threshold interval, and the corresponding exercise intensity is set through the sub-intervals in each threshold interval.
需要补充说明的是,在本实施例的替代方案中,本实施例也可不做动作类型识别,直接获取待识别特征(例如录入视频或图像)的运动类型(例如录入视频或图像时同时说明对应的运动类型),然后直接将待识别特征识别获得的至少一个角度时间变化曲 线与该录入的运动类型对应的标准动作的相应角度时间变化曲线进行比较;比较方法可如前所述。It should be supplemented that, in an alternative to this embodiment, this embodiment may not perform action type recognition, and directly obtain the movement type of the feature to be recognized (for example, entering a video or image) (for example, when entering a video or image, the corresponding The type of movement), and then directly compare the at least one angle-time variation curve obtained by the feature recognition to be identified with the corresponding angle-time variation curve of the standard action corresponding to the entered type of movement; the comparison method can be as described above.
在图10中,运动指导模块(动态评估)主要输入为单人或多人的运动视频,经多人姿态估计模型获取运动视频流(帧)中各个人体的关键点二维坐标信息,视频流(帧)的二维坐标经过角度计算模块获取视频流(帧)中每个人的每个特定关节角度的连续变化曲线值(视频(流)的每一帧可以当做一个时间点,每个时间点的每个角度值的连线即为(角度值y/帧x)的角度变化曲线),并通过与相应的标准运动曲线做比较分析,其中,标准运动曲线通过本申请的模型识别出关键点及各关节角度变化值,进而获得标准运动曲线,给出运动矫正指导。In Figure 10, the main input of the motion guidance module (dynamic evaluation) is the motion video of a single person or multiple people. The two-dimensional coordinate information of the key points of each human body in the motion video stream (frame) is obtained through the multi-person pose estimation model. The video stream The two-dimensional coordinates of the (frame) obtain the continuously changing curve value of each specific joint angle of each person in the video stream (frame) through the angle calculation module (each frame of the video (stream) can be regarded as a time point, each time point The connection line of each angle value is the angle change curve of (angle value y/frame x), which is compared and analyzed with the corresponding standard motion curve. Among them, the standard motion curve identifies the key points through the model of this application And each joint angle change value, and then obtain the standard movement curve, give the movement correction guidance.
具体的实现如下:其中,每个人的每个特定角度随着视频流(帧)的输入,会记录一段连续的角度变化曲线;在第一预设数据库中,已经计算好并存储每类标准动作(包括同一个动作的不同站姿和方位)的每个特定关节角度的角度时间变化曲线,当经过上述获取视频流(帧)中每个人的每个特定关节角度的角度时间变化曲线时,将其与相应的标准动作的角度时间变化曲线进行匹配比较;其中,角度时间变化曲线的相邻最值的差(最低值和最高值)可以用来判断待测试特定关节的动作幅度是否规范,如果待测试者的关节的角度时间变化曲线的相邻最大值与最小值的距离,和标准运动视频中相对位置的距离值的差值大于指定的阈值(即,本申请实施例中的第二预设阈值区间),则可断定为此部位运动不规范;另一方面,角度变化曲线的每两个峰值之间的距离(相邻两个最大值或最小值之间的距离)可以用来衡量特定角度运动的强度,如果待测试者指定的关节的角度时间变化曲线相邻最大值之间距离,和标准运动视频中相对位置的距离值的差值大于一指定的阈值,且该差值位于该阈值所在的区间(即,本申请实施例中的第三预设阈值区间),则可断定为此关节运动强度过高;如果位于一指定的阈值所在的区间则可断定运动强度适中(即,本申请实施例中的第四预设阈值区间);如果小于一指定的阈值,且该差值位于该阈值所在的区间则可断定运动强度过低(即,本申请实施例中的第五预设阈值区间)。综合全部关节的规范值和强度值,得出一个最终评估。The specific implementation is as follows: where each specific angle of each person is recorded with the input of the video stream (frame), a continuous angle change curve is recorded; in the first preset database, each type of standard action has been calculated and stored (Including different stances and orientations of the same action) the angle time change curve of each specific joint angle, when the angle time change curve of each specific joint angle of each person in the video stream (frame) is obtained, the It is matched and compared with the angle-time variation curve of the corresponding standard action; among them, the difference (the lowest value and the highest value) between the adjacent maximum values of the angle-time variation curve can be used to determine whether the motion amplitude of the specific joint to be tested is standardized, if The distance between the adjacent maximum value and the minimum value of the angle-time variation curve of the joint of the test subject and the distance value of the relative position in the standard motion video is greater than the specified threshold (that is, the second prediction in the embodiment of the present application). Set the threshold interval), it can be concluded that the movement of this part is not standardized; on the other hand, the distance between each two peaks of the angle change curve (the distance between two adjacent maximum or minimum values) can be used to measure The intensity of a specific angle exercise, if the distance between the adjacent maximum value of the angle time change curve of the joint specified by the tester, and the distance value of the relative position in the standard exercise video, the difference is greater than a specified threshold, and the difference lies in The interval in which the threshold is located (that is, the third preset threshold interval in the embodiment of the present application), it can be concluded that the joint exercise intensity is too high; if it is in the interval of a specified threshold, it can be concluded that the exercise intensity is moderate (that is, , The fourth preset threshold interval in the embodiment of the present application); if it is less than a specified threshold and the difference is in the interval where the threshold is located, it can be concluded that the exercise intensity is too low (ie, the fifth in the embodiment of the present application) Preset threshold interval). Integrate the norm values and strength values of all joints to get a final assessment.
可选的,本申请实施例提供的数据识别方法还包括:Optionally, the data identification method provided in the embodiment of the present application further includes:
步骤S1109,依据识别结果在第二预设数据库中进行匹配,得到识别结果对应的体态评估结果。In step S1109, matching is performed in the second preset database according to the recognition result to obtain a posture evaluation result corresponding to the recognition result.
具体的,角度值-体态知识库为本申请实施例中的第二预设数据库,在一具体实施例中,每个部位的体态评估风险分为低风险、潜在风险和高风险三个等级。具体的匹配流程为:Specifically, the angle value-posture knowledge base is the second preset database in the embodiment of the application. In a specific embodiment, the posture assessment risk of each part is divided into three levels: low risk, potential risk, and high risk. The specific matching process is:
(1)头侧倾风险评估(0-4度:低风险、4-9度:潜在风险、9度:以上高风险)主要匹配的角度为1;(1) Head roll risk assessment (0-4 degrees: low risk, 4-9 degrees: potential risk, 9 degrees: above high risk) The main matching angle is 1;
(2)高低肩风险评估(0-2度:低风险、2-4度:潜在风险、4度:以上高风险)主要匹配的角度为2;(2) High and low shoulder risk assessment (0-2 degrees: low risk, 2-4 degrees: potential risks, 4 degrees: above high risks) The main matching angle is 2;
(3)脊柱异位风险评估(0-2度:低风险、2-4度:潜在风险、4度:以上高风险)主要匹配的角度为5;(3) The risk assessment of ectopic spine (0-2 degree: low risk, 2-4 degree: potential risk, 4 degree: above high risk) The main matching angle is 5;
(4)骨盆侧倾风险评估(0-2度:低风险、2-4度:潜在风险、4度:以上高风险)主要匹配的角度为6;(4) Pelvic roll risk assessment (0-2 degrees: low risk, 2-4 degrees: potential risk, 4 degrees: above high risk) The main matching angle is 6;
(5)异常腿型风险评估(176-180度:低风险、173-176度:潜在风险、173度:以下高风险)主要匹配的角度为8和9;(5) The risk assessment of abnormal legs (176-180 degrees: low risk, 173-176 degrees: potential risk, 173 degrees: the following high risk) The main matching angles are 8 and 9;
(6)头部前倾与圆肩风险评估(0-9度:低风险、9-14度:潜在风险、14度:以上高风险)主要匹配的角度为10;(6) The main matching angle of head tilt and round shoulder risk assessment (0-9 degrees: low risk, 9-14 degrees: potential risk, 14 degrees: above high risk) is 10;
(7)膝过伸风险评估(179-180度:低风险、177-179度:潜在风险、177度:以下高风险)主要匹配的角度为13。(7) Knee hyperextension risk assessment (179-180 degrees: low risk, 177-179 degrees: potential risk, 177 degrees: following high risk) The main matching angle is 13.
基于上述匹配流程,图14是根据本发明实施例的数据识别方法中体态风险的评估结果的示意图,如图14所示,本申请实施例归结了7种常见的不健康体态,分别为头侧倾、高低肩、脊柱异位、骨盆侧倾、异常腿型、头部前倾与圆肩和膝过伸。Based on the above matching process, FIG. 14 is a schematic diagram of the evaluation result of the posture risk in the data recognition method according to the embodiment of the present invention. As shown in FIG. 14, the embodiment of the present application summarizes 7 common unhealthy postures, namely head tilt , High and low shoulders, ectopic spine, pelvic tilt, abnormal leg shape, head tilt and round shoulders and knee hyperextension.
进一步地,可选的,在得到识别结果对应的体态评估结果之后,本申请实施例提供的数据识别方法还包括:Further, optionally, after obtaining the posture evaluation result corresponding to the recognition result, the data recognition method provided in the embodiment of the present application further includes:
步骤S1110,依据体态评估结果在第三预设数据库中进行匹配,得到体态评估结果对应的建议信息。In step S1110, matching is performed in the third preset database according to the posture evaluation result to obtain suggestion information corresponding to the posture evaluation result.
具体的,在本申请实施例中第一预设数据库、第二预设数据库和第三预设数据库可以为三个独立的数据库,或位于不同服务器上的数据库,或一台服务器上三个存储空间上的数据库,或,一个数据库中用于存储不同类型映射关系的数据模块,基于步骤S1109得到的评估结果,根据各个评估结果提供对应的建议信息,该建议信息包括但不限于相应体态提示的可能的疾病隐患、改善建议等,例如:在评估结果包括目标存在头部前倾与圆肩风险的情况下,对应该评估结果的建议信息可以包括:体态上将会造成颈椎移位,突出;如体态发生上述变化将导致眩晕和神经性头疼,头部胀痛;建议避免低头长时间玩手机,长时间对着电脑,电视,看书等,建议多参加体育锻炼,特别是球类运动;Specifically, in the embodiment of the present application, the first preset database, the second preset database, and the third preset database may be three independent databases, or databases located on different servers, or three stores on one server. A spatial database, or a data module used to store different types of mapping relationships in a database, based on the evaluation results obtained in step S1109, and provide corresponding advice information according to each evaluation result. The advice information includes but is not limited to the corresponding posture prompts Potential diseases, suggestions for improvement, etc., for example: when the assessment result includes the risk of forward tilt and round shoulders in the target, the recommended information corresponding to the assessment result can include: posture will cause cervical spine displacement and protrusion; The above changes in posture will cause dizziness, neurological headaches, and head pain; it is recommended to avoid playing with mobile phones for a long time, facing the computer, TV, and reading books for a long time. It is recommended to participate in more physical exercises, especially ball sports;
或,在评估结果包括骨盆侧倾的情况下,对应该评估结果的建议信息可以包括:体态上将会造成长短腿,腰间盘突出;如体态发生上述变化将导致双腿长度不一,站立受体重影响两腿承重不一;如产生腰间盘突出,将会导致腰椎受力不均,有瘫痪在床的风险;长短腿建议:避免跷二郎腿,单腿支撑坐姿,站立时单腿承重等;腰间盘突出建议:避免久坐,建议多参加体育锻炼,适量活动腰椎,并配合定期按摩推拿正骨。Or, when the evaluation result includes pelvic tilt, the recommended information corresponding to the evaluation result may include: posture will cause long and short legs, protruding lumbar disc; if the posture changes above, the length of the legs will be different, standing The weight of the receptor affects the weight bearing of the two legs; if the lumbar disc herniation occurs, it will cause uneven force on the lumbar spine and risk of paralysis in bed; recommendations for long and short legs: avoid crossing the two legs, supporting the sitting posture with one leg, and bearing the weight with one leg when standing Recommendations for lumbar disc herniation: Avoid sitting for a long time. It is recommended to take part in more physical exercises, exercise the lumbar spine appropriately, and cooperate with regular massage and massage.
此外,本申请实施例提供的数据识别方法还可以适用于在线购物,以在线买衣服为例,用户通过上传自拍照片或自拍视频,通过步骤S1102至步骤S1106进行识别,得到识别结果,依据该识别结果,与服务器中存储的使用商品A的模特进行对比,依据对比结果提供购物建议,例如,商品A的尺寸有:S码,M码,L码,XL码,XXL码;若通过步骤S1102至步骤S1106进行识别得到该用户的提醒与模特的体型相同,且模特穿戴商品A的尺寸为M码,则建议用户购买M码的商品A;若相比模特体型瘦小,则建议用户购买S码的商品A;反之,相比模特体型的胖的程度建议用户购买L码,XL码或XXL码的商品A。In addition, the data recognition method provided by the embodiments of the present application can also be applied to online shopping. Taking clothes shopping online as an example, a user uploads a selfie photo or a selfie video, and performs identification through steps S1102 to S1106, and obtains the identification result, based on the identification As a result, it is compared with the model using product A stored in the server, and shopping suggestions are provided based on the comparison result. For example, the sizes of product A are: S size, M size, L size, XL size, XXL size; if steps S1102 to Step S1106 is performed to identify that the user’s reminder is the same size as the model, and the size of the product A worn by the model is M, it is recommended that the user buy product A in M size; if it is smaller than the model, it is recommended that the user buy S size Commodity A; On the contrary, it is recommended that users buy commodity A in L size, XL size or XXL size compared to the fat of the model.
实施例五Example five
根据本发明实施例的一个方面,提供了一种数据识别装置,图15是根据本发明实施例的数据识别装置的示意图,如图15所示,包括:坐标识别模块1502,设置为将待识别的特征数据输入权重注意力神经网络模型,识别得到待识别的特征数据中至少一个目标的关键点二维坐标,其中,权重注意力神经网络模型设置为通过自顶向下的方式进行至少一人的姿态估计,检测待识别的特征数据中至少一个目标的位置矩形框,并检测位置矩形框内目标的关键点二维坐标;计算模块1504,设置为通过目标的关键点二维坐标进行计算,得到第一预设关键点组合的连线与第二预设关键点组合的连线之间的夹角或第一预设关键点组合的连线与第一预设线之间的夹角;匹配模块1506,设置为将第一预设关键点组合的连线与第二预设关键点组合的连线之间的夹角或第一预设关键点组合的连线与第一预设线之间的夹角在第一预设数据库中进行匹配,得出目标的识别结果。According to one aspect of the embodiment of the present invention, a data recognition device is provided. FIG. 15 is a schematic diagram of the data recognition device according to an embodiment of the present invention. As shown in FIG. Input the weighted attention neural network model of the feature data to identify the two-dimensional coordinates of the key points of at least one target in the feature data to be recognized. The weighted attention neural network model is set to perform top-down processing of at least one person Posture estimation, detecting the position rectangle of at least one target in the feature data to be recognized, and detecting the two-dimensional coordinates of the key points of the target in the position rectangle; the calculation module 1504 is set to calculate through the two-dimensional coordinates of the key points of the target to obtain The angle between the line of the first preset key point combination and the line of the second preset key point combination or the angle between the line of the first preset key point combination and the first preset line; The module 1506 is configured to set the angle between the line of the first preset key point combination and the line of the second preset key point combination or the angle between the line of the first preset key point combination and the first preset line The angle between the two is matched in the first preset database to obtain the recognition result of the target.
可选的,匹配模块1506包括:第一匹配单元,设置为在待识别的特征数据包括图片数据的情况下,将得到的至少一个夹角的角度值与第一预设数据库中的相应的夹角类型的角度值进行匹配,得出图片数据的识别结果。Optionally, the matching module 1506 includes: a first matching unit configured to compare the obtained angle value of at least one included angle with a corresponding value in the first preset database when the feature data to be recognized includes image data. The angle value of the angle type is matched to obtain the recognition result of the image data.
可选的,匹配模块1506包括:获取单元,设置为在待识别的特征数据包括视频数据的情况下,针对每一帧或指定帧,获取视频数据中各相应帧的至少一个目标的关键点二维坐标信息,其中,指定帧为固定时间间隔帧和/或关键帧;第二匹配单元,设置 为依据视频数据中各相应帧的至少一个目标的关键点二维坐标信息得到至少一个目标的至少一个特定夹角的角度时间变化曲线,并通过与至少一种标准运动的至少一个夹角的角度时间变化曲线做比较分析,得到识别结果。Optionally, the matching module 1506 includes: an acquiring unit configured to acquire key points of at least one target of each corresponding frame in the video data for each frame or specified frame in the case that the feature data to be identified includes video data. Dimensional coordinate information, wherein the designated frame is a fixed time interval frame and/or a key frame; the second matching unit is set to obtain at least one target's two-dimensional coordinate information according to the key point two-dimensional coordinate information of at least one target of each corresponding frame in the video data An angle-time variation curve of a specific included angle is compared and analyzed with at least one angle-time variation curve of at least one standard motion to obtain an identification result.
进一步地,可选的,第二匹配单元包括:第一判断子单元,设置为将至少一个目标的至少一个特定夹角的角度时间变化曲线,与预先获得的至少一种标准运动的至少一个夹角的角度时间变化曲线进行相似度比较,若相似度落入第一预设阈值区间,则判定视频数据中各相应帧的相应目标正在进行所对应的标准运动类型;比较子单元,设置为在判定视频数据中各相应帧的相应目标正在进行所对应的标准运动类型的情况下,进一步比较该目标的至少一个特定夹角的角度时间变化曲线与标准运动的相应特定夹角的角度时间变化曲线;第二判断子单元,设置为若目标的至少一个特定夹角的角度时间变化曲线上相邻最值的差,和标准运动的相应特定夹角的角度时间变化曲线上相邻最值的差落入第二预设阈值区间,则判断视频数据中各相应帧的目标的特定夹角所对应的关节动作规范,否则视频数据中该目标的特定夹角所对应的关节动作不规范;第三判断子单元,设置为判断目标的至少一个特定夹角的角度时间变化曲线上相邻峰值之间的距离,和标准运动的相应特定夹角的角度时间变化曲线上相邻峰值的差是否落入第三预设阈值区间、第四预设阈值区间或第五预设阈值区间,进而确认视频数据中各相应帧的目标的特定夹角所对应的关节动作运动强度过低、适当或过高。Further, optionally, the second matching unit includes: a first judging subunit, configured to clip the angle-time variation curve of at least one specific included angle of the at least one target with at least one pre-obtained at least one standard motion curve. The angle-time variation curve of the angle is compared for similarity. If the similarity falls within the first preset threshold interval, it is determined that the corresponding target of each corresponding frame in the video data is performing the corresponding standard motion type; the comparison subunit is set to be in In the case of determining that the corresponding target of each corresponding frame in the video data is performing the corresponding standard motion type, further compare the angle time change curve of at least one specific angle of the target with the angle time change curve of the corresponding specific angle of the standard motion ; The second judging subunit is set to determine the difference between the adjacent maximum value on the angle-time variation curve of at least one specific included angle of the target and the difference between the adjacent maximum value on the angle-time variation curve of the corresponding specific included angle of the standard motion If it falls within the second preset threshold interval, determine the joint motion specification corresponding to the specific included angle of the target of each corresponding frame in the video data, otherwise the joint motion corresponding to the specific included angle of the target in the video data is not standardized; third; The judging subunit is set to judge whether the distance between adjacent peaks on the angle-time variation curve of at least one specific included angle of the target and the adjacent peaks on the angle-time variation curve of the corresponding specific included angle of the standard motion falls within The third preset threshold interval, the fourth preset threshold interval or the fifth preset threshold interval further confirms that the joint motion intensity corresponding to the specific included angle of the target of each corresponding frame in the video data is too low, appropriate or too high.
可选的,本申请实施例提供的数据识别装置还包括:评估模块,设置为依据识别结果在第二预设数据库中进行匹配,得到识别结果对应的体态评估结果。Optionally, the data recognition device provided in the embodiment of the present application further includes: an evaluation module configured to perform matching in a second preset database according to the recognition result to obtain a posture evaluation result corresponding to the recognition result.
进一步地,可选的,本申请实施例提供的数据识别装置还包括:建议模块,设置为在得到识别结果对应的体态评估结果之后,依据体态评估结果在第三预设数据库中进行匹配,得到体态评估结果对应的建议信息。Further, optionally, the data recognition device provided in the embodiment of the present application further includes: a suggestion module configured to perform a matching in a third preset database according to the posture evaluation result after obtaining the posture evaluation result corresponding to the recognition result to obtain Suggested information corresponding to the posture assessment results.
实施例六Example Six
根据本发明实施例的一个方面,提供了一种非易失性存储介质,非易失性存储介质包括存储的程序,其中,在程序运行时控制非易失性存储介质所在设备执行上述方法。According to one aspect of the embodiments of the present invention, there is provided a non-volatile storage medium, the non-volatile storage medium includes a stored program, wherein the device where the non-volatile storage medium is located is controlled to execute the above method when the program is running.
实施例七Example Seven
根据本发明实施例的一个方面,提供了一种数据识别装置,包括:非易失性存储介质和设置为运行存储于非易失性存储介质中的程序的处理器,程序运行时执行上述方法。According to one aspect of the embodiments of the present invention, there is provided a data recognition device, including: a non-volatile storage medium and a processor configured to run a program stored in the non-volatile storage medium, and the above method is executed when the program is running .
上述本发明实施例序号仅仅为了描述,不代表实施例的优劣。The sequence numbers of the foregoing embodiments of the present invention are only for description, and do not represent the superiority or inferiority of the embodiments.
在本发明的上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。In the above-mentioned embodiments of the present invention, the description of each embodiment has its own focus. For parts that are not described in detail in an embodiment, reference may be made to related descriptions of other embodiments.
在本申请所提供的几个实施例中,应该理解到,所揭露的技术内容,可通过其它的方式实现。其中,以上所描述的装置实施例仅仅是示意性的,例如所述单元的划分,可以为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,单元或模块的间接耦合或通信连接,可以是电性或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed technical content can be implemented in other ways. The device embodiments described above are only illustrative. For example, the division of the units may be a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or may be Integrate into another system, or some features can be ignored or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, units or modules, and may be in electrical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, the functional units in the various embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可为个人计算机、服务器或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium. Based on this understanding, the technical solution of the present invention essentially or the part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium. , Including several instructions to make a computer device (which can be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present invention. The aforementioned storage media include: U disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or optical disk and other media that can store program codes. .
以上所述仅是本发明的优选实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本发明原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也应视为本发明的保护范围。The above are only the preferred embodiments of the present invention. It should be pointed out that for those of ordinary skill in the art, without departing from the principle of the present invention, several improvements and modifications can be made, and these improvements and modifications are also It should be regarded as the protection scope of the present invention.
工业实用性Industrial applicability
本申请实施例中提供的方案,可以应用于图像识别过程中,例如,可以应用于人体姿态的图像识别过程中。基于本申请实施例提供的方案,由于采用自顶向下的多人姿态估计的方式,通过将待识别的特征数据输入权重注意力神经网络模型,识别得到待识别的特征数据中至少一个目标的关键点二维坐标,因此,实现了根据提升精度和效率后的人体姿态提供评估结果,达到了提升对人体姿态的识别精度和效率的目的, 进而解决了由于相关技术在对人体姿态的识别过程中,数据处理效率低的技术问题。The solutions provided in the embodiments of the present application can be applied to the image recognition process, for example, can be applied to the image recognition process of the human body posture. Based on the solution provided by the embodiment of the present application, since the top-down multi-person pose estimation method is adopted, the feature data to be recognized is input into the weighted attention neural network model to identify at least one target in the feature data to be recognized The two-dimensional coordinates of the key points, therefore, realize the evaluation results are provided based on the human body posture after the accuracy and efficiency are improved, and the purpose of improving the recognition accuracy and efficiency of the human posture is achieved, thereby solving the problem of the recognition process of the human posture due to related technologies In, the technical problem of low data processing efficiency.

Claims (30)

  1. 一种数据处理方法,包括:A data processing method, including:
    将具备第一数量通道的第一特征数据输入至具备第二数量滤波器的第一类卷积层进行计算,输出具备第二数量通道的第二特征数据,其中,所述第一数量大于所述第二数量;The first feature data with the first number of channels is input to the first type convolutional layer with the second number of filters for calculation, and the second feature data with the second number of channels is output, where the first number is greater than all The second quantity;
    将所述具备第二数量通道的第二特征数据输入至具备所述第二数量滤波器的第二类卷积层,并根据所述第二类卷积层中可学习的掩码参数,通过神经网络生成所述第二类卷积层中各个滤波器的权重的掩码;Input the second feature data with the second number of channels to the second type convolutional layer with the second number of filters, and according to the mask parameters that can be learned in the second type convolutional layer, pass A neural network generates a mask of the weight of each filter in the second-type convolutional layer;
    依据所述掩码确定所述第二类卷积层中的各个滤波器与所述第二特征数据中的各通道的连接方式;Determine, according to the mask, a connection mode between each filter in the second-type convolutional layer and each channel in the second feature data;
    依据所述连接方式得到的映射关系对所述第二特征数据进行卷积计算,得到第三特征数据;Performing convolution calculation on the second feature data according to the mapping relationship obtained by the connection mode to obtain third feature data;
    将所述具备第二数量通道的第三特征数据输入至具备所述第一数量滤波器的第三类卷积层进行计算,输出具备所述第一数量通道的第四特征数据。The third feature data with the second number of channels is input to the third type convolutional layer with the first number of filters for calculation, and the fourth feature data with the first number of channels is output.
  2. 根据权利要求1所述的方法,其中,所述数据处理方法应用于人工智能中的深度学习。The method according to claim 1, wherein the data processing method is applied to deep learning in artificial intelligence.
  3. 根据权利要求1所述的方法,其中,所述数据处理方法应用于识别图片/视频中的目标的姿态或动作。The method according to claim 1, wherein the data processing method is applied to recognize the posture or action of the target in the picture/video.
  4. 根据权利要求1所述的方法,其中,根据所述第二类卷积层中可学习的掩码参数,通过神经网络生成所述第二类卷积层中各个滤波器的权重的掩码包括:根据所述第二类卷积层中的全连接层生成第二类卷积层中各个滤波器的权重的掩码。The method according to claim 1, wherein, according to the mask parameters that can be learned in the second-type convolutional layer, generating a mask of the weights of each filter in the second-type convolutional layer through a neural network comprises : Generate a mask of the weight of each filter in the second-type convolutional layer according to the fully connected layer in the second-type convolutional layer.
  5. 一种数据训练方法,包括:A data training method includes:
    获取待训练的权重分类模型,其中,所述权重分类模型为获取图像数据的图像特征的神经网络模型;Acquiring a weight classification model to be trained, wherein the weight classification model is a neural network model for acquiring image features of the image data;
    对所述待训练的权重分类模型进行训练,得到收敛的权重分类模型;Training the weight classification model to be trained to obtain a convergent weight classification model;
    其中,所述对所述待训练的权重分类模型进行训练中使用的方法包括权利要求1中的数据处理方法。Wherein, the method used in training the weight classification model to be trained includes the data processing method in claim 1.
  6. 根据权利要求5所述的方法,其中,所述对所述待训练的权重分类模型进行训练,得到收敛的权重分类模型包括:The method according to claim 5, wherein the training the weight classification model to be trained to obtain a convergent weight classification model comprises:
    将第一预设数据集中的数据输入所述待训练的权重分类模型,得到类别预测结果;Input the data in the first preset data set into the weight classification model to be trained to obtain a category prediction result;
    依据所述类别预测结果与所述第一预测数据集中的数据的标签类别,得到所述类别预测结果与所述第一预测数据集中的数据的标签类别的误差;Obtaining an error between the category prediction result and the label category of the data in the first prediction data set according to the category prediction result and the label category of the data in the first prediction data set;
    依据所述误差进行反向传播算法训练所述待训练的权重分类模型,直至所述待训练的权重分类模型收敛,得到所述收敛的权重分类模型。Performing a back propagation algorithm to train the weight classification model to be trained according to the error until the weight classification model to be trained converges to obtain the converged weight classification model.
  7. 根据权利要求6所述的方法,其中,所述依据所述误差进行反向传播算法训练所述待训练的权重分类模型,直至所述待训练的权重分类模型收敛包括:The method according to claim 6, wherein the training of the weight classification model to be trained by the back propagation algorithm according to the error until the weight classification model to be trained converges comprises:
    通过激励传播和权重更新的反复迭代,直至所述待训练的权重分类模型收敛;Through repeated iterations of incentive propagation and weight update, until the weight classification model to be trained converges;
    其中,在所述待训练的权重分类模型包括残差结构,池化结构和全连接结构的情况下,通过激励传播和权重更新的反复迭代,直至所述待训练的权重分类模型收敛包括:Wherein, in the case that the weight classification model to be trained includes a residual structure, a pooling structure, and a fully connected structure, the repeated iterations of incentive propagation and weight update until the weight classification model to be trained converges include:
    在激励传播阶段,将图像通过所述待训练的权重分类模型的卷积层获取特征,在所述待训练的权重分类模型的全连接层获取类别预测结果,再将所述类别预测结果与第一预测数据集中的数据的标签类别求差,得到隐藏层和输出层的响应误差;In the stage of incentive propagation, the image is passed through the convolutional layer of the weight classification model to be trained to obtain features, the category prediction results are obtained in the fully connected layer of the weight classification model to be trained, and then the category prediction results are compared with the first One predicts the difference of the label categories of the data in the data set to obtain the response errors of the hidden layer and the output layer;
    在权重更新阶段,将所述误差与本层响应对前一层响应的函数的导数相乘,获得两层之间权重矩阵的梯度,沿所述梯度的反方向以设定的学习率调整权重矩阵;将所述梯度矩阵确定为前一层的误差,并计算前一层的权重矩阵,通过迭代计算对所述待训练的权重分类模型更新,直至所述待训练的权重分类模型收敛。In the weight update stage, the error is multiplied by the derivative of the function of the response of the current layer to the response of the previous layer to obtain the gradient of the weight matrix between the two layers, and the weight is adjusted at the set learning rate along the opposite direction of the gradient. Matrix; the gradient matrix is determined as the error of the previous layer, and the weight matrix of the previous layer is calculated, and the weight classification model to be trained is updated through iterative calculation until the weight classification model to be trained converges.
  8. 一种数据训练方法,包括:A data training method includes:
    通过收敛的权重分类模型初始化目标检测模型中的特征提取模块,获得待训练的目标检测模型;其中,所述收敛的权重分类模型通过权利要求5中所述的方法训练得到;The feature extraction module in the target detection model is initialized by the convergent weight classification model to obtain the target detection model to be trained; wherein the convergent weight classification model is obtained by training the method described in claim 5;
    通过第二预设数据集中的目标位置框标签信息对待训练的目标检测模型进行训练,得到训练后的目标检测模型;Training the target detection model to be trained by using the target location frame label information in the second preset data set to obtain the trained target detection model;
    依据第三预设数据集中的目标关键点标签信息对待训练的单人姿态估计模型 的网络参数进行训练,得到训练后的单人姿态估计模型;Training the network parameters of the single-person pose estimation model to be trained according to the target key point label information in the third preset data set to obtain the trained single-person pose estimation model;
    依据所述训练后的目标检测模型和所述训练后的单人姿态估计模型,得到权重注意力神经网络模型。According to the trained target detection model and the trained single-person pose estimation model, a weighted attention neural network model is obtained.
  9. 根据权利要求8所述的方法,其中,所述通过第二预设数据集中的目标位置框标签信息对待训练的目标检测模型进行训练,得到训练后的目标检测模型包括:The method according to claim 8, wherein the training the target detection model to be trained by using the target location frame label information in the second preset data set to obtain the trained target detection model comprises:
    在所述目标检测模型包括特征提取模块、建议框生成模块和目标分类器与位置框回归预测模块的情况下,In the case where the target detection model includes a feature extraction module, a suggestion box generation module, and a target classifier and position box regression prediction module,
    分别对所述特征提取模块和所述建议框生成模块进行训练,得到特征提取模块第一参数值和建议框生成模块第一参数值;Separately training the feature extraction module and the suggestion box generation module to obtain the first parameter value of the feature extraction module and the first parameter value of the suggestion box generation module;
    依据特征提取模块第一参数值和建议框生成模块第一参数值训练所述目标分类器与位置框回归预测模块,得到目标分类器与位置框回归预测模块第一参数值和特征提取模块第二参数值;Train the target classifier and the position box regression prediction module according to the first parameter value of the feature extraction module and the first parameter value of the suggestion box generation module to obtain the first parameter value of the target classifier and the position box regression prediction module and the second parameter value of the feature extraction module Parameter value
    依据目标分类器与位置框回归预测模块第一参数值和特征提取模块第二参数值训练所述建议框生成模块,得到建议框生成模块第二参数值;Training the suggestion box generation module according to the first parameter value of the target classifier and the position box regression prediction module and the second parameter value of the feature extraction module to obtain the second parameter value of the suggestion box generation module;
    依据建议框生成模块第二参数值和特征提取模块第二参数值训练所述目标分类器与位置框回归预测模块,得到目标分类器与位置框回归预测模块第二参数值。The target classifier and the position box regression prediction module are trained according to the second parameter value of the suggestion box generation module and the second parameter value of the feature extraction module to obtain the second parameter value of the target classifier and the position box regression prediction module.
  10. 根据权利要求9所述的方法,其中,所述特征提取模块用于提取所述第二预设数据集中的各个数据的特征;所述建议框生成模块用于依据所述第二预设数据集中的各个数据的特征生成各个数据的候选目标框;所述目标分类器与位置框回归预测模块用于依据所述第二预设数据集中的各个数据的特征和所述各个数据的候选目标框获取所述第二预设数据集中各个数据的目标的检测框及相应检测框的类别;9. The method according to claim 9, wherein the feature extraction module is used to extract features of each data in the second preset data set; the suggestion box generation module is used to extract features according to the second preset data set The feature of each data generates a candidate target frame of each data; the target classifier and the position frame regression prediction module are used to obtain the candidate target frame of each data according to the characteristics of each data in the second preset data set The detection frame of the target of each data in the second preset data set and the type of the corresponding detection frame;
    在所述建议框生成模块包括一个滑窗的卷积层,所述卷积层后连接两个并行的卷积层,所述两个并行的卷积层分别为回归层和分类层的情况下,所述建议框生成模块用于依据所述第二预设数据集中的各个数据的特征生成各个数据的候选目标框包括:When the suggestion frame generation module includes a convolutional layer with a sliding window, after the convolutional layer is connected two parallel convolutional layers, and the two parallel convolutional layers are respectively a regression layer and a classification layer The suggestion frame generating module is used to generate candidate target frames for each data according to the characteristics of each data in the second preset data set, including:
    依据所述第二预设数据集中的各个数据的特征通过所述回归层,得到所述第二预设数据集中的各个数据的各个候选目标框的中心锚点的坐标和相应的候选目标框的宽与高;According to the characteristics of each data in the second preset data set, through the regression layer, the coordinates of the center anchor point of each candidate target frame of each data in the second preset data set and the corresponding candidate target frame's coordinates are obtained. Width and height
    通过所述分类层判定所述各个数据的各个候选目标框是前景或背景。It is determined by the classification layer that each candidate target frame of each data is foreground or background.
  11. 根据权利要求10所述的方法,其中,在所述目标分类器与位置框回归预测模块的结构为顺次连接的一个池化层、三个全连接层和并行的两个全连接层的情况下,所述目标分类器与位置框回归预测模块用于依据所述第二预设数据集中的各个数据的特征和所述各个数据的候选目标框获取所述第二预设数据集中各个数据的各个目标的检测框和相应的检测框的类别包括:The method according to claim 10, wherein the structure of the target classifier and the position box regression prediction module is a case where one pooling layer, three fully connected layers and two fully connected layers are connected in sequence Next, the target classifier and position frame regression prediction module is used to obtain the information of each data in the second preset data set according to the characteristics of each data in the second preset data set and the candidate target frame of each data The detection frame of each target and the corresponding detection frame category include:
    通过所述池化层将所述特征提取模块输出的不同长度的各个数据的特征转换为固定长度的各个数据的特征;Converting the features of the data of different lengths output by the feature extraction module into the features of the data of fixed length through the pooling layer;
    依据所述固定长度的各个数据的特征,分别通过所述三个全连接层后再通过所述并行的两个全连接层,输出所述第二预设数据集中各个数据的各个目标的检测框及相应检测框的类别。According to the characteristics of each data of the fixed length, the detection frame of each target of each data in the second preset data set is output after passing through the three fully connected layers and then through the two parallel fully connected layers. And the category of the corresponding detection frame.
  12. 根据权利要求8所述的方法,其中,所述依据第三预设数据集中的目标关键点标签信息对待训练的单人姿态估计模型的网络参数进行训练,得到训练后的单人姿态估计模型包括:The method according to claim 8, wherein the training the network parameters of the single-person pose estimation model to be trained according to the target key point label information in the third preset data set, and obtaining the single-person pose estimation model after training comprises :
    依据所述第三预设数据集中的目标关键点标签信息对待训练的单人姿态估计模型的网络参数进行训练,通过前向传播和后向传播算法迭代的更新所述待训练的单人姿态估计模型的网络参数;The network parameters of the single-person pose estimation model to be trained are trained according to the target key point label information in the third preset data set, and the single-person pose estimation to be trained is iteratively updated through forward propagation and backward propagation algorithms Network parameters of the model;
    其中,所述依据所述第三预设数据集中的目标关键点标签信息对待训练的单人姿态估计模型的网络参数进行训练,通过前向传播和后向传播算法迭代的更新所述待训练的单人姿态估计模型的网络参数包括:Wherein, the network parameters of the single-person pose estimation model to be trained are trained according to the target key point label information in the third preset data set, and the to-be-trained is updated iteratively through forward propagation and backward propagation algorithms The network parameters of the single-person pose estimation model include:
    依据预设宽高比对输入的单人图像的高度或宽度进行扩展,并将所述单人图像裁剪为预设尺寸。Expand the height or width of the input single-person image according to the preset aspect ratio, and crop the single-person image to a preset size.
  13. 根据权利要求8所述的方法,其中,所述对待训练的单人姿态估计模型的网络参数进行训练中使用的方法包括权利要求1中的数据处理方法。The method according to claim 8, wherein the method used in training the network parameters of the single-person pose estimation model to be trained comprises the data processing method in claim 1.
  14. 根据权利要求8所述的方法,其中,所述方法还包括:The method according to claim 8, wherein the method further comprises:
    收集训练所述待训练的目标检测模型和待训练的单人姿态估计模型所需的样本;Collecting samples required for training the target detection model to be trained and the single-person pose estimation model to be trained;
    对所述样本进行预处理,其中,所述预处理包括:数据集的划分和预处理操作;Preprocessing the sample, where the preprocessing includes: data set division and preprocessing operations;
    所述对所述待训练的权重分类模型进行训练,得到收敛的权重分类模型包括:The training the weight classification model to be trained to obtain the convergent weight classification model includes:
    将第一预设数据集中的数据输入所述待训练的权重分类模型,得到类别预测结果;Input the data in the first preset data set into the weight classification model to be trained to obtain a category prediction result;
    依据所述类别预测结果与所述第一预测数据集中的数据的标签类别,得到所述类别预测结果与所述第一预测数据集中的数据的标签类别的误差;Obtaining an error between the category prediction result and the label category of the data in the first prediction data set according to the category prediction result and the label category of the data in the first prediction data set;
    依据所述误差进行反向传播算法训练所述待训练的权重分类模型,直至所述待训练的权重分类模型收敛,得到所述收敛的权重分类模型。Performing a back propagation algorithm to train the weight classification model to be trained according to the error until the weight classification model to be trained converges to obtain the converged weight classification model.
  15. 根据权利要求14所述的方法,其中,第一预设数据集包括:第一类图像数据集,第一类图像数据集自定义了训练集和验证集;所述第二预设数据集包括第二类图像数据集和第三类图像数据集中有位置框信息标注的数据集合;所述第二类图像数据集自定义了训练集和验证集;所述第三类图像数据集按照预设比例随机划分为训练集和验证集;所述第二类图像数据集的训练集和所述第三类图像数据集的训练集为第二预设数据集中的训练集,所述第二类图像数据集的验证集和所述第三类图像数据集的验证集为第二预设数据集中的验证集;所述第三预设数据集包括所述第二类图像数据集和第三类图像数据集中有关键点信息标注的数据集合;The method according to claim 14, wherein the first preset data set comprises: a first type of image data set, the first type of image data set is customized with a training set and a verification set; the second preset data set comprises The second type of image data set and the third type of image data set have a data set labeled with position frame information; the second type of image data set is customized with a training set and a verification set; the third type of image data set is in accordance with the preset The ratio is randomly divided into a training set and a verification set; the training set of the second type of image data set and the training set of the third type of image data set are the training set of the second preset data set, and the second type of image The validation set of the data set and the validation set of the third type of image data set are the validation set of the second preset data set; the third preset data set includes the second type of image data set and the third type of image Data set marked with key point information in the data set;
    所述预处理操作包括:通过随机几何变换对第一预设数据集和第三预设数据集中的数据分别进行处理;通过随机混合操作和/或随机几何变换对所述第二预设数据集中的数据进行处理。The preprocessing operation includes: separately processing the data in the first preset data set and the third preset data set through random geometric transformation; and performing the random mixing operation and/or random geometric transformation on the second preset data set. Data to be processed.
  16. 根据权利要求15所述的方法,其中,所述通过随机几何变换包括随机裁剪、按预设角度进行随机旋转和/或按照预设缩放比例进行随机缩放;所述随机混合操作包括将至少两个数据按照预设权重进行重合,具体为将不同数据中的预设位置像素值与预设权重的乘积相加。The method according to claim 15, wherein the random geometric transformation includes random cropping, random rotation according to a preset angle, and/or random scaling according to a preset zoom ratio; the random mixing operation includes combining at least two The data is overlapped according to the preset weight, which is specifically adding the product of the preset position pixel value in different data and the preset weight.
  17. 一种数据识别方法,基于权利要求8至16中任意一项所述的方法,包括:A data recognition method, based on the method described in any one of claims 8 to 16, comprising:
    将待识别的特征数据输入权重注意力神经网络模型,识别得到所述待识别的特征数据中至少一个目标的关键点二维坐标,其中,所述权重注意力神经网络模型用于通过自顶向下的方式进行至少一人的姿态估计,检测所述待识别的特征数据中至少一个目标的位置矩形框,并检测位置矩形框内目标的关键点二维坐标;The feature data to be recognized is input into the weighted attention neural network model, and the two-dimensional coordinates of key points of at least one target in the feature data to be recognized are recognized. Perform posture estimation of at least one person in the following manner, detect the position rectangular frame of at least one target in the feature data to be recognized, and detect the two-dimensional coordinates of key points of the target in the position rectangular frame;
    通过所述目标的关键点二维坐标进行计算,得到第一预设关键点组合的连线与第二预设关键点组合的连线之间的夹角或第一预设关键点组合的连线与第一预设线之间的夹角;By calculating the two-dimensional coordinates of the key points of the target, the angle between the line of the first preset key point combination and the line of the second preset key point combination or the connection of the first preset key point combination is obtained. The angle between the line and the first preset line;
    将所述第一预设关键点组合的连线与第二预设关键点组合的连线之间的夹角 或第一预设关键点组合的连线与第一预设线之间的夹角在第一预设数据库中进行匹配,得出所述目标的识别结果。The angle between the line of the first preset key point combination and the line of the second preset key point combination or the clamp between the line of the first preset key point combination and the first preset line The angle is matched in the first preset database to obtain the recognition result of the target.
  18. 根据权利要求17所述的方法,其中,所述将所述第一预设关键点组合的连线与第二预设关键点组合的连线之间的夹角或第一预设关键点组合的连线与第一预设线之间的夹角在第一预设数据库中进行匹配,得出所述目标的识别结果包括:18. The method according to claim 17, wherein the angle between the line connecting the first preset key point combination and the line connecting the second preset key point combination or the first preset key point combination The included angle between the connection line of and the first preset line is matched in the first preset database, and obtaining the recognition result of the target includes:
    在所述待识别的特征数据包括图片数据的情况下,将得到的至少一个夹角的角度值与第一预设数据库中的相应的夹角类型的角度值进行匹配,得出所述图片数据的识别结果。In the case where the feature data to be identified includes picture data, the obtained angle value of at least one included angle is matched with the angle value of the corresponding included angle type in the first preset database to obtain the picture data The recognition result.
  19. 根据权利要求17所述的方法,其中,所述将所述第一预设关键点组合的连线与第二预设关键点组合的连线之间的夹角或第一预设关键点组合的连线与第一预设线之间的夹角在第一预设数据库中进行匹配,得出所述目标的识别结果包括:18. The method according to claim 17, wherein the angle between the line connecting the first preset key point combination and the line connecting the second preset key point combination or the first preset key point combination The included angle between the connection line of and the first preset line is matched in the first preset database, and obtaining the recognition result of the target includes:
    在所述待识别的特征数据包括视频数据的情况下,针对每一帧或指定帧,获取所述视频数据中各相应帧的至少一个目标的关键点二维坐标信息,其中,所述指定帧为固定时间间隔帧和/或关键帧;In the case where the feature data to be identified includes video data, for each frame or specified frame, obtain key point two-dimensional coordinate information of at least one target in each corresponding frame in the video data, wherein the specified frame It is a fixed time interval frame and/or key frame;
    依据所述视频数据中各相应帧的至少一个目标的关键点二维坐标信息得到至少一个目标的至少一个特定夹角的角度时间变化曲线,并通过与至少一种标准运动的至少一个夹角的角度时间变化曲线做比较分析,得到所述识别结果。According to the two-dimensional coordinate information of the key point of at least one target in each corresponding frame of the video data, the angle time variation curve of at least one specific included angle of at least one target is obtained, and the angle time variation curve of at least one specific included angle of at least one target is determined by The angle-time variation curve is compared and analyzed to obtain the recognition result.
  20. 根据权利要求19所述的方法,其中,所述依据所述视频数据中各相应帧的至少一个目标的关键点二维坐标信息得到至少一个目标的至少一个特定夹角的角度时间变化曲线,并通过与至少一种标准运动的至少一个夹角的角度时间变化曲线做比较分析,得到所述识别结果包括:The method according to claim 19, wherein the angle-time variation curve of at least one specific included angle of at least one target is obtained according to the key point two-dimensional coordinate information of at least one target in each corresponding frame of the video data, and By comparing and analyzing the angle-time variation curve of at least one included angle with at least one standard motion, obtaining the recognition result includes:
    将所述至少一个目标的至少一个特定夹角的角度时间变化曲线,与预先获得的至少一种标准运动的至少一个夹角的角度时间变化曲线进行相似度比较,若相似度落入第一预设阈值区间,则判定所述视频数据中各相应帧的相应目标正在进行所对应的标准运动类型;Compare the angle-time variation curve of at least one specific angle of the at least one target with the pre-obtained angle-time variation curve of at least one angle of at least one standard motion, and if the similarity falls within the first prediction If a threshold interval is set, it is determined that the corresponding target of each corresponding frame in the video data is performing the corresponding standard motion type;
    在判定所述视频数据中各相应帧的相应目标正在进行所对应的标准运动类型的情况下,进一步比较该目标的至少一个特定夹角的角度时间变化曲线与标准运动的相应特定夹角的角度时间变化曲线;In the case of determining that the corresponding target of each corresponding frame in the video data is performing the corresponding standard motion type, further compare the angle time change curve of at least one specific included angle of the target with the angle of the corresponding specific included angle of the standard motion Time curve
    若所述目标的至少一个特定夹角的角度时间变化曲线上相邻最值的差,和所述标准运动的相应特定夹角的角度时间变化曲线上相邻最值的差落入第二预设阈 值区间,则判断所述视频数据中各相应帧的所述目标的特定夹角所对应的关节动作规范,否则所述视频数据中该目标的特定夹角所对应的关节动作不规范;If the difference between the adjacent maximum value on the angle-time variation curve of at least one specific included angle of the target and the adjacent maximum value on the angle-time variation curve of the corresponding specific included angle of the standard motion falls into the second prediction If a threshold interval is set, the joint motion specification corresponding to the specific included angle of the target in each corresponding frame of the video data is determined, otherwise the joint motion corresponding to the specific included angle of the target in the video data is not standardized;
    判断所述目标的至少一个特定夹角的角度时间变化曲线上相邻峰值之间的距离,和所述标准运动的相应特定夹角的角度时间变化曲线上相邻峰值的差是否落入第三预设阈值区间、第四预设阈值区间或第五预设阈值区间,进而确认所述视频数据中各相应帧的所述目标的特定夹角所对应的关节动作运动强度过低、适当或过高。Determine whether the distance between adjacent peaks on the angle-time variation curve of at least one specific included angle of the target and the difference between adjacent peaks on the angle-time variation curve of the corresponding specific included angle of the standard motion falls into the third Preset threshold interval, fourth preset threshold interval, or fifth preset threshold interval, and then confirm that the joint motion intensity corresponding to the specific included angle of the target in each corresponding frame of the video data is too low, appropriate, or excessive high.
  21. 根据权利要求17所述的方法,其中,所述方法还包括:The method according to claim 17, wherein the method further comprises:
    依据所述识别结果在第二预设数据库中进行匹配,得到所述识别结果对应的体态评估结果。Matching is performed in the second preset database according to the recognition result to obtain the posture evaluation result corresponding to the recognition result.
  22. 根据权利要求21所述的方法,其中,在得到所述识别结果对应的体态评估结果之后,所述方法还包括:The method according to claim 21, wherein, after obtaining the posture evaluation result corresponding to the recognition result, the method further comprises:
    依据所述体态评估结果在所述第三预设数据库中进行匹配,得到所述体态评估结果对应的建议信息。Matching is performed in the third preset database according to the posture evaluation result to obtain suggestion information corresponding to the posture evaluation result.
  23. 一种数据识别装置,包括:A data recognition device includes:
    坐标识别模块,设置为将待识别的特征数据输入权重注意力神经网络模型,识别得到所述待识别的特征数据中至少一个目标的关键点二维坐标,其中,所述权重注意力神经网络模型设置为通过自顶向下的方式进行至少一人的姿态估计,检测所述待识别的特征数据中至少一个目标的位置矩形框,并检测位置矩形框内目标的关键点二维坐标;The coordinate recognition module is configured to input the feature data to be recognized into the weighted attention neural network model, and identify two-dimensional coordinates of key points of at least one target in the feature data to be recognized, wherein the weighted attention neural network model Configured to estimate the pose of at least one person in a top-down manner, detect the position rectangle of at least one target in the feature data to be recognized, and detect the two-dimensional coordinates of key points of the target in the position rectangle;
    计算模块,设置为通过所述目标的关键点二维坐标进行计算,得到第一预设关键点组合的连线与第二预设关键点组合的连线之间的夹角或第一预设关键点组合的连线与第一预设线之间的夹角;The calculation module is configured to calculate through the two-dimensional coordinates of the key points of the target to obtain the angle between the line of the first preset key point combination and the line of the second preset key point combination or the first preset The angle between the line of the key point combination and the first preset line;
    匹配模块,设置为将所述第一预设关键点组合的连线与第二预设关键点组合的连线之间的夹角或第一预设关键点组合的连线与第一预设线之间的夹角在第一预设数据库中进行匹配,得出所述目标的识别结果。The matching module is configured to compare the angle between the line of the first preset key point combination and the line of the second preset key point combination or the line of the first preset key point combination with the first preset The angle between the lines is matched in the first preset database to obtain the recognition result of the target.
  24. 根据权利要求23所述的装置,其中,所述匹配模块包括:The device according to claim 23, wherein the matching module comprises:
    第一匹配单元,设置为在所述待识别的特征数据包括图片数据的情况下,将得到的至少一个夹角的角度值与第一预设数据库中的相应的夹角类型的角度值进 行匹配,得出所述图片数据的识别结果。The first matching unit is configured to match the obtained angle value of at least one included angle with the angle value of the corresponding included angle type in the first preset database when the feature data to be recognized includes image data , Get the recognition result of the picture data.
  25. 根据权利要求23所述的装置,其中,所述匹配模块包括:The device according to claim 23, wherein the matching module comprises:
    获取单元,设置为在所述待识别的特征数据包括视频数据的情况下,针对每一帧或指定帧,获取所述视频数据中各相应帧的至少一个目标的关键点二维坐标信息,其中,所述指定帧为固定时间间隔帧和/或关键帧;The acquiring unit is configured to acquire, for each frame or specified frame, the key point two-dimensional coordinate information of at least one target of each corresponding frame in the video data when the feature data to be identified includes video data, wherein , The designated frame is a fixed time interval frame and/or a key frame;
    第二匹配单元,设置为依据所述视频数据中各相应帧的至少一个目标的关键点二维坐标信息得到至少一个目标的至少一个特定夹角的角度时间变化曲线,并通过与至少一种标准运动的至少一个夹角的角度时间变化曲线做比较分析,得到识别结果。The second matching unit is configured to obtain the angle-time variation curve of at least one specific included angle of at least one target according to the two-dimensional coordinate information of the key point of at least one target in each corresponding frame of the video data, and to pass and at least one standard The angle-time variation curve of at least one included angle of motion is compared and analyzed to obtain the recognition result.
  26. 根据权利要求25所述的装置,其中,所述第二匹配单元包括:The device according to claim 25, wherein the second matching unit comprises:
    第一判断子单元,设置为将所述至少一个目标的至少一个特定夹角的角度时间变化曲线,与预先获得的至少一种标准运动的至少一个夹角的角度时间变化曲线进行相似度比较,若相似度落入第一预设阈值区间,则判定所述视频数据中各相应帧的相应目标正在进行所对应的标准运动类型;The first judging subunit is configured to compare the angle-time variation curve of at least one specific included angle of the at least one target with a pre-obtained angle-time variation curve of at least one included angle of at least one standard motion, and If the similarity falls within the first preset threshold interval, it is determined that the corresponding target of each corresponding frame in the video data is performing the corresponding standard motion type;
    比较子单元,设置为在判定所述视频数据中各相应帧的相应目标正在进行所对应的标准运动类型的情况下,进一步比较所述目标的至少一个特定夹角的角度时间变化曲线与标准运动的相应特定夹角的角度时间变化曲线;The comparison subunit is configured to further compare the angle-time variation curve of at least one specific included angle of the target with the standard motion when it is determined that the corresponding target of each corresponding frame in the video data is performing the corresponding standard motion type. The angle-time variation curve of the corresponding specific included angle;
    第二判断子单元,设置为若目标的至少一个特定夹角的角度时间变化曲线上相邻最值的差,和标准运动的相应特定夹角的角度时间变化曲线上相邻最值的差落入第二预设阈值区间,则判断视频数据中各相应帧的目标的特定夹角所对应的关节动作规范,否则视频数据中该目标的特定夹角所对应的关节动作不规范;The second judgment subunit is set to determine if the difference between the adjacent maximum value on the angle-time variation curve of at least one specific included angle of the target and the adjacent maximum value on the angle-time variation curve of the corresponding specific included angle of the standard motion Enter the second preset threshold interval, determine the joint motion specification corresponding to the specific included angle of the target in each corresponding frame in the video data, otherwise the joint motion corresponding to the specific included angle of the target in the video data is not standardized;
    第三判断子单元,设置为判断目标的至少一个特定夹角的角度时间变化曲线上相邻峰值之间的距离,和标准运动的相应特定夹角的角度时间变化曲线上相邻峰值的差是否落入第三预设阈值区间、第四预设阈值区间或第五预设阈值区间,进而确认视频数据中各相应帧的目标的特定夹角所对应的关节动作运动强度过低、适当或过高。The third judging subunit is configured to judge whether the distance between adjacent peaks on the angle-time variation curve of at least one specific included angle of the target and the adjacent peaks on the angle-time variation curve of the corresponding specific included angle of the standard motion is different Fall into the third preset threshold interval, the fourth preset threshold interval, or the fifth preset threshold interval, and then confirm that the joint motion intensity corresponding to the specific included angle of the target of each corresponding frame in the video data is too low, appropriate, or excessive high.
  27. 根据权利要求23所述的装置,其中,所述装置还包括:The device according to claim 23, wherein the device further comprises:
    评估模块,设置为依据所述识别结果在第二预设数据库中进行匹配,得到所述识别结果对应的体态评估结果。The evaluation module is configured to perform matching in a second preset database according to the recognition result to obtain a posture evaluation result corresponding to the recognition result.
  28. 根据权利要求27所述的装置,其中,所述装置还包括:The device according to claim 27, wherein the device further comprises:
    建议模块,设置为在得到所述识别结果对应的体态评估结果之后,依据所述体态评估结果在第三预设数据库中进行匹配,得到体态评估结果对应的建议信息。The suggestion module is configured to, after obtaining the posture evaluation result corresponding to the recognition result, perform matching in the third preset database according to the posture evaluation result to obtain the suggestion information corresponding to the posture evaluation result.
  29. 一种非易失性存储介质,所述非易失性存储介质包括存储的程序,其中在所述程序运行时控制所述非易失性存储介质所在设备执行权利要求1至22中任意一项所述的方法。A non-volatile storage medium, the non-volatile storage medium includes a stored program, wherein when the program is running, the device where the non-volatile storage medium is located is controlled to execute any one of claims 1 to 22 The method described.
  30. 一种数据识别装置,包括:非易失性存储介质和设置为运行存储于非易失性存储介质中的程序的处理器,其中所述程序运行时执行权利要求1至22中任意一项所述的方法。A data identification device, comprising: a non-volatile storage medium and a processor configured to run a program stored in the non-volatile storage medium, wherein the program executes any one of claims 1 to 22 when running The method described.
PCT/CN2020/117226 2019-09-29 2020-09-23 Data processing method, data training method, data identifying method and device, and storage medium WO2021057810A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910935970.3A CN111881705B (en) 2019-09-29 2019-09-29 Data processing, training and identifying method, device and storage medium
CN201910935970.3 2019-09-29

Publications (1)

Publication Number Publication Date
WO2021057810A1 true WO2021057810A1 (en) 2021-04-01

Family

ID=73153962

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/117226 WO2021057810A1 (en) 2019-09-29 2020-09-23 Data processing method, data training method, data identifying method and device, and storage medium

Country Status (2)

Country Link
CN (1) CN111881705B (en)
WO (1) WO2021057810A1 (en)

Cited By (43)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113221824A (en) * 2021-05-31 2021-08-06 之江实验室 Human body posture recognition method based on individual model generation
CN113268626A (en) * 2021-05-26 2021-08-17 中国人民武装警察部队特种警察学院 Data processing method and device, electronic equipment and storage medium
CN113283343A (en) * 2021-05-26 2021-08-20 上海商汤智能科技有限公司 Crowd positioning method and device, electronic equipment and storage medium
CN113297913A (en) * 2021-04-26 2021-08-24 云南电网有限责任公司信息中心 Method for identifying dressing specification of distribution network field operating personnel
CN113312969A (en) * 2021-04-23 2021-08-27 浙江省机电设计研究院有限公司 Part identification and positioning method and system based on three-dimensional vision
CN113313129A (en) * 2021-06-22 2021-08-27 中国平安财产保险股份有限公司 Method, device and equipment for training disaster recognition model and storage medium
CN113326779A (en) * 2021-05-31 2021-08-31 中煤科工集团沈阳研究院有限公司 Underground roadway accumulated water detection and identification method
CN113436259A (en) * 2021-06-23 2021-09-24 国网智能科技股份有限公司 Deep learning-based real-time positioning method and system for substation equipment
CN113435267A (en) * 2021-06-09 2021-09-24 江苏第二师范学院 Online education student concentration discrimination method based on improved convolutional neural network
CN113448955A (en) * 2021-08-30 2021-09-28 上海观安信息技术股份有限公司 Data set quality evaluation method and device, computer equipment and storage medium
CN113537325A (en) * 2021-07-05 2021-10-22 北京航空航天大学 Deep learning method for image classification based on logic of extracting high-low-level features
CN113592941A (en) * 2021-08-02 2021-11-02 北京中交兴路信息科技有限公司 Certificate image verification method and device, storage medium and terminal
CN113610215A (en) * 2021-07-09 2021-11-05 北京达佳互联信息技术有限公司 Task processing network generation method, task processing device, electronic equipment and storage medium
CN113657185A (en) * 2021-07-26 2021-11-16 广东科学技术职业学院 Intelligent auxiliary method, device and medium for piano practice
CN113723187A (en) * 2021-07-27 2021-11-30 武汉光庭信息技术股份有限公司 Semi-automatic labeling method and system for gesture key points
CN113808744A (en) * 2021-09-22 2021-12-17 河北工程大学 Diabetes risk prediction method, device, equipment and storage medium
CN113837894A (en) * 2021-08-06 2021-12-24 国网江苏省电力有限公司南京供电分公司 Non-invasive resident user load decomposition method based on residual convolution module
CN113869353A (en) * 2021-08-16 2021-12-31 深延科技(北京)有限公司 Model training method, tiger key point detection method and related device
CN114067359A (en) * 2021-11-03 2022-02-18 天津理工大学 Pedestrian detection method integrating human body key points and attention features of visible parts
CN114118127A (en) * 2021-10-15 2022-03-01 北京工业大学 Visual scene mark detection and identification method and device
CN114283495A (en) * 2021-12-16 2022-04-05 北京航空航天大学 Human body posture estimation method based on binarization neural network
CN114332547A (en) * 2022-03-17 2022-04-12 浙江太美医疗科技股份有限公司 Medical object classification method and apparatus, electronic device, and storage medium
CN114470719A (en) * 2022-03-22 2022-05-13 北京蓝田医疗设备有限公司 Full-automatic posture correction training method and system
CN114595748A (en) * 2022-02-21 2022-06-07 南昌大学 Data segmentation method for fall protection system
CN114723517A (en) * 2022-03-18 2022-07-08 唯品会(广州)软件有限公司 Virtual fitting method, device and storage medium
CN114783065A (en) * 2022-05-12 2022-07-22 大连大学 Parkinson's disease early warning method based on human body posture estimation
CN114818991A (en) * 2022-06-22 2022-07-29 西南石油大学 Running behavior identification method based on convolutional neural network and acceleration sensor
CN115019338A (en) * 2022-04-27 2022-09-06 淮阴工学院 Multi-person posture estimation method and system based on GAMIHR-Net
CN115308247A (en) * 2022-10-11 2022-11-08 江苏昭华精密铸造科技有限公司 Method for detecting deslagging quality of aluminum oxide powder
CN115702993A (en) * 2021-08-12 2023-02-17 荣耀终端有限公司 Rope skipping state detection method and electronic equipment
CN115879586A (en) * 2022-01-11 2023-03-31 北京中关村科金技术有限公司 Complaint prediction optimization method and device based on ablation experiment and storage medium
CN116112932A (en) * 2023-02-20 2023-05-12 南京航空航天大学 Data knowledge dual-drive radio frequency fingerprint identification method and system
CN116106307A (en) * 2023-03-31 2023-05-12 深圳上善智能有限公司 Image recognition-based detection result evaluation method of intelligent cash dispenser
CN116309591A (en) * 2023-05-19 2023-06-23 杭州健培科技有限公司 Medical image 3D key point detection method, model training method and device
CN116665309A (en) * 2023-07-26 2023-08-29 山东睿芯半导体科技有限公司 Method, device, chip and terminal for identifying walking gesture features
CN116678506A (en) * 2023-08-02 2023-09-01 国检测试控股集团南京国材检测有限公司 Wireless transmission heat loss detection device
CN117036661A (en) * 2023-08-06 2023-11-10 苏州三垣航天科技有限公司 On-line real-time performance evaluation method for spatial target gesture recognition neural network
CN117037272A (en) * 2023-08-08 2023-11-10 深圳市震有智联科技有限公司 Method and system for monitoring fall of old people
CN117456612A (en) * 2023-12-26 2024-01-26 西安龙南铭科技有限公司 Cloud computing-based body posture automatic assessment method and system
CN117457193A (en) * 2023-12-22 2024-01-26 之江实验室 Physical health monitoring method and system based on human body key point detection
CN117573655A (en) * 2024-01-15 2024-02-20 中国标准化研究院 Data management optimization method and system based on convolutional neural network
CN117675112A (en) * 2024-02-01 2024-03-08 阳光凯讯(北京)科技股份有限公司 Communication signal processing method, system, equipment and medium based on machine learning
CN117726977A (en) * 2024-02-07 2024-03-19 南京百伦斯智能科技有限公司 Experimental operation key node scoring method and system based on DCNN

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112307991A (en) * 2020-11-04 2021-02-02 北京临近空间飞行器系统工程研究所 Image recognition method, device and storage medium
CN112101314B (en) * 2020-11-17 2021-03-09 北京健康有益科技有限公司 Human body posture recognition method and device based on mobile terminal
CN112487964B (en) * 2020-11-27 2023-08-01 深圳市维海德技术股份有限公司 Gesture detection and recognition method, gesture detection and recognition equipment and computer-readable storage medium
CN112989312B (en) * 2020-11-30 2024-04-30 北京金堤科技有限公司 Verification code identification method and device, electronic equipment and storage medium
CN112613490B (en) * 2021-01-08 2022-02-01 云从科技集团股份有限公司 Behavior recognition method and device, machine readable medium and equipment
CN112731161B (en) * 2021-02-08 2021-10-26 中南大学 Nonlinear data feature extraction and classification prediction method based on small amount of data mixed insertion
CN113420604B (en) * 2021-05-28 2023-04-18 沈春华 Multi-person posture estimation method and device and electronic equipment
CN114004709B (en) * 2021-11-11 2024-04-30 重庆邮电大学 Information propagation monitoring method and device and computer readable storage medium
CN114494192B (en) * 2022-01-26 2023-04-25 西南交通大学 Thoracolumbar fracture identification segmentation and detection positioning method based on deep learning
CN114783000B (en) * 2022-06-15 2022-10-18 成都东方天呈智能科技有限公司 Method and device for detecting dressing standard of worker in bright kitchen range scene
CN116563848B (en) * 2023-07-12 2023-11-10 北京大学 Abnormal cell identification method, device, equipment and storage medium
CN117270818B (en) * 2023-10-11 2024-04-09 北京航空航天大学 Method and system for identifying and generating software demand class diagram information in MOM standard

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106203623A (en) * 2014-11-27 2016-12-07 三星电子株式会社 The method of method and apparatus and dimensionality reduction for extending neutral net
CN107909145A (en) * 2017-12-05 2018-04-13 苏州天瞳威视电子科技有限公司 A kind of training method of convolutional neural networks model
WO2018217828A1 (en) * 2017-05-23 2018-11-29 Intel Corporation Methods and apparatus for discriminative semantic transfer and physics-inspired optimization of features in deep learning
CN109117897A (en) * 2018-08-09 2019-01-01 百度在线网络技术(北京)有限公司 Image processing method, device and readable storage medium storing program for executing based on convolutional neural networks
CN109214346A (en) * 2018-09-18 2019-01-15 中山大学 Picture human motion recognition method based on hierarchical information transmitting
CN109918975A (en) * 2017-12-13 2019-06-21 腾讯科技(深圳)有限公司 A kind of processing method of augmented reality, the method for Object identifying and terminal

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6763148B1 (en) * 2000-11-13 2004-07-13 Visual Key, Inc. Image recognition methods
US9805305B2 (en) * 2015-08-07 2017-10-31 Yahoo Holdings, Inc. Boosted deep convolutional neural networks (CNNs)
US10831444B2 (en) * 2016-04-04 2020-11-10 Technion Research & Development Foundation Limited Quantized neural network training and inference
KR101688458B1 (en) * 2016-04-27 2016-12-23 디아이티 주식회사 Image inspection apparatus for manufactured articles using deep neural network training method and image inspection method of manufactured articles thereby
US20170344876A1 (en) * 2016-05-31 2017-11-30 Samsung Electronics Co., Ltd. Efficient sparse parallel winograd-based convolution scheme
US20180247227A1 (en) * 2017-02-24 2018-08-30 Xtract Technologies Inc. Machine learning systems and methods for data augmentation
WO2018226014A1 (en) * 2017-06-07 2018-12-13 삼성전자주식회사 Electronic device and method for controlling same
CN109801225B (en) * 2018-12-06 2022-12-27 重庆邮电大学 Human face reticulate pattern stain removing method based on multitask full convolution neural network
CN109816636B (en) * 2018-12-28 2020-11-27 汕头大学 Crack detection method based on intelligent terminal
CN110188635B (en) * 2019-05-16 2021-04-30 南开大学 Plant disease and insect pest identification method based on attention mechanism and multi-level convolution characteristics

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106203623A (en) * 2014-11-27 2016-12-07 三星电子株式会社 The method of method and apparatus and dimensionality reduction for extending neutral net
WO2018217828A1 (en) * 2017-05-23 2018-11-29 Intel Corporation Methods and apparatus for discriminative semantic transfer and physics-inspired optimization of features in deep learning
CN107909145A (en) * 2017-12-05 2018-04-13 苏州天瞳威视电子科技有限公司 A kind of training method of convolutional neural networks model
CN109918975A (en) * 2017-12-13 2019-06-21 腾讯科技(深圳)有限公司 A kind of processing method of augmented reality, the method for Object identifying and terminal
CN109117897A (en) * 2018-08-09 2019-01-01 百度在线网络技术(北京)有限公司 Image processing method, device and readable storage medium storing program for executing based on convolutional neural networks
CN109214346A (en) * 2018-09-18 2019-01-15 中山大学 Picture human motion recognition method based on hierarchical information transmitting

Cited By (75)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113312969A (en) * 2021-04-23 2021-08-27 浙江省机电设计研究院有限公司 Part identification and positioning method and system based on three-dimensional vision
CN113297913A (en) * 2021-04-26 2021-08-24 云南电网有限责任公司信息中心 Method for identifying dressing specification of distribution network field operating personnel
CN113297913B (en) * 2021-04-26 2023-05-26 云南电网有限责任公司信息中心 Identification method for dressing specification of distribution network field operators
CN113268626B (en) * 2021-05-26 2024-04-26 中国人民武装警察部队特种警察学院 Data processing method, device, electronic equipment and storage medium
CN113268626A (en) * 2021-05-26 2021-08-17 中国人民武装警察部队特种警察学院 Data processing method and device, electronic equipment and storage medium
CN113283343A (en) * 2021-05-26 2021-08-20 上海商汤智能科技有限公司 Crowd positioning method and device, electronic equipment and storage medium
WO2022247091A1 (en) * 2021-05-26 2022-12-01 上海商汤智能科技有限公司 Crowd positioning method and apparatus, electronic device, and storage medium
CN113326779A (en) * 2021-05-31 2021-08-31 中煤科工集团沈阳研究院有限公司 Underground roadway accumulated water detection and identification method
CN113221824A (en) * 2021-05-31 2021-08-06 之江实验室 Human body posture recognition method based on individual model generation
CN113326779B (en) * 2021-05-31 2024-03-22 中煤科工集团沈阳研究院有限公司 Underground roadway ponding detection and identification method
CN113435267A (en) * 2021-06-09 2021-09-24 江苏第二师范学院 Online education student concentration discrimination method based on improved convolutional neural network
CN113435267B (en) * 2021-06-09 2023-06-23 江苏第二师范学院 Online education student concentration discriminating method based on improved convolutional neural network
CN113313129A (en) * 2021-06-22 2021-08-27 中国平安财产保险股份有限公司 Method, device and equipment for training disaster recognition model and storage medium
CN113313129B (en) * 2021-06-22 2024-04-05 中国平安财产保险股份有限公司 Training method, device, equipment and storage medium for disaster damage recognition model
CN113436259A (en) * 2021-06-23 2021-09-24 国网智能科技股份有限公司 Deep learning-based real-time positioning method and system for substation equipment
CN113537325A (en) * 2021-07-05 2021-10-22 北京航空航天大学 Deep learning method for image classification based on logic of extracting high-low-level features
CN113537325B (en) * 2021-07-05 2023-07-11 北京航空航天大学 Deep learning method for image classification based on extracted high-low layer feature logic
CN113610215A (en) * 2021-07-09 2021-11-05 北京达佳互联信息技术有限公司 Task processing network generation method, task processing device, electronic equipment and storage medium
CN113657185A (en) * 2021-07-26 2021-11-16 广东科学技术职业学院 Intelligent auxiliary method, device and medium for piano practice
CN113723187A (en) * 2021-07-27 2021-11-30 武汉光庭信息技术股份有限公司 Semi-automatic labeling method and system for gesture key points
CN113592941A (en) * 2021-08-02 2021-11-02 北京中交兴路信息科技有限公司 Certificate image verification method and device, storage medium and terminal
CN113592941B (en) * 2021-08-02 2023-09-12 北京中交兴路信息科技有限公司 Certificate image verification method and device, storage medium and terminal
CN113837894B (en) * 2021-08-06 2023-12-19 国网江苏省电力有限公司南京供电分公司 Non-invasive resident user load decomposition method based on residual convolution module
CN113837894A (en) * 2021-08-06 2021-12-24 国网江苏省电力有限公司南京供电分公司 Non-invasive resident user load decomposition method based on residual convolution module
CN115702993B (en) * 2021-08-12 2023-10-31 荣耀终端有限公司 Rope skipping state detection method and electronic equipment
CN115702993A (en) * 2021-08-12 2023-02-17 荣耀终端有限公司 Rope skipping state detection method and electronic equipment
CN113869353A (en) * 2021-08-16 2021-12-31 深延科技(北京)有限公司 Model training method, tiger key point detection method and related device
CN113448955A (en) * 2021-08-30 2021-09-28 上海观安信息技术股份有限公司 Data set quality evaluation method and device, computer equipment and storage medium
CN113808744A (en) * 2021-09-22 2021-12-17 河北工程大学 Diabetes risk prediction method, device, equipment and storage medium
CN114118127B (en) * 2021-10-15 2024-05-21 北京工业大学 Visual scene sign detection and recognition method and device
CN114118127A (en) * 2021-10-15 2022-03-01 北京工业大学 Visual scene mark detection and identification method and device
CN114067359B (en) * 2021-11-03 2024-05-07 天津理工大学 Pedestrian detection method integrating human body key points and visible part attention characteristics
CN114067359A (en) * 2021-11-03 2022-02-18 天津理工大学 Pedestrian detection method integrating human body key points and attention features of visible parts
CN114283495B (en) * 2021-12-16 2024-05-28 北京航空航天大学 Human body posture estimation method based on binarization neural network
CN114283495A (en) * 2021-12-16 2022-04-05 北京航空航天大学 Human body posture estimation method based on binarization neural network
CN115879586B (en) * 2022-01-11 2024-01-02 北京中关村科金技术有限公司 Complaint prediction optimization method and device based on ablation experiment and storage medium
CN115879586A (en) * 2022-01-11 2023-03-31 北京中关村科金技术有限公司 Complaint prediction optimization method and device based on ablation experiment and storage medium
CN114595748B (en) * 2022-02-21 2024-02-13 南昌大学 Data segmentation method for fall protection system
CN114595748A (en) * 2022-02-21 2022-06-07 南昌大学 Data segmentation method for fall protection system
CN114332547A (en) * 2022-03-17 2022-04-12 浙江太美医疗科技股份有限公司 Medical object classification method and apparatus, electronic device, and storage medium
CN114723517A (en) * 2022-03-18 2022-07-08 唯品会(广州)软件有限公司 Virtual fitting method, device and storage medium
CN114470719A (en) * 2022-03-22 2022-05-13 北京蓝田医疗设备有限公司 Full-automatic posture correction training method and system
CN114470719B (en) * 2022-03-22 2022-12-20 北京蓝田医疗设备有限公司 Full-automatic posture correction training method and system
CN115019338B (en) * 2022-04-27 2023-09-22 淮阴工学院 Multi-person gesture estimation method and system based on GAMHR-Net
CN115019338A (en) * 2022-04-27 2022-09-06 淮阴工学院 Multi-person posture estimation method and system based on GAMIHR-Net
CN114783065A (en) * 2022-05-12 2022-07-22 大连大学 Parkinson's disease early warning method based on human body posture estimation
CN114783065B (en) * 2022-05-12 2024-03-29 大连大学 Parkinsonism early warning method based on human body posture estimation
CN114818991A (en) * 2022-06-22 2022-07-29 西南石油大学 Running behavior identification method based on convolutional neural network and acceleration sensor
CN114818991B (en) * 2022-06-22 2022-09-27 西南石油大学 Running behavior identification method based on convolutional neural network and acceleration sensor
CN115308247B (en) * 2022-10-11 2022-12-16 江苏昭华精密铸造科技有限公司 Method for detecting deslagging quality of aluminum oxide powder
CN115308247A (en) * 2022-10-11 2022-11-08 江苏昭华精密铸造科技有限公司 Method for detecting deslagging quality of aluminum oxide powder
CN116112932B (en) * 2023-02-20 2023-11-10 南京航空航天大学 Data knowledge dual-drive radio frequency fingerprint identification method and system
CN116112932A (en) * 2023-02-20 2023-05-12 南京航空航天大学 Data knowledge dual-drive radio frequency fingerprint identification method and system
CN116106307A (en) * 2023-03-31 2023-05-12 深圳上善智能有限公司 Image recognition-based detection result evaluation method of intelligent cash dispenser
CN116106307B (en) * 2023-03-31 2023-06-30 深圳上善智能有限公司 Image recognition-based detection result evaluation method of intelligent cash dispenser
CN116309591A (en) * 2023-05-19 2023-06-23 杭州健培科技有限公司 Medical image 3D key point detection method, model training method and device
CN116309591B (en) * 2023-05-19 2023-08-25 杭州健培科技有限公司 Medical image 3D key point detection method, model training method and device
CN116665309A (en) * 2023-07-26 2023-08-29 山东睿芯半导体科技有限公司 Method, device, chip and terminal for identifying walking gesture features
CN116665309B (en) * 2023-07-26 2023-11-14 山东睿芯半导体科技有限公司 Method, device, chip and terminal for identifying walking gesture features
CN116678506A (en) * 2023-08-02 2023-09-01 国检测试控股集团南京国材检测有限公司 Wireless transmission heat loss detection device
CN116678506B (en) * 2023-08-02 2023-10-10 国检测试控股集团南京国材检测有限公司 Wireless transmission heat loss detection device
CN117036661A (en) * 2023-08-06 2023-11-10 苏州三垣航天科技有限公司 On-line real-time performance evaluation method for spatial target gesture recognition neural network
CN117036661B (en) * 2023-08-06 2024-04-12 苏州三垣航天科技有限公司 On-line real-time performance evaluation method for spatial target gesture recognition neural network
CN117037272B (en) * 2023-08-08 2024-03-19 深圳市震有智联科技有限公司 Method and system for monitoring fall of old people
CN117037272A (en) * 2023-08-08 2023-11-10 深圳市震有智联科技有限公司 Method and system for monitoring fall of old people
CN117457193A (en) * 2023-12-22 2024-01-26 之江实验室 Physical health monitoring method and system based on human body key point detection
CN117457193B (en) * 2023-12-22 2024-04-02 之江实验室 Physical health monitoring method and system based on human body key point detection
CN117456612B (en) * 2023-12-26 2024-03-12 西安龙南铭科技有限公司 Cloud computing-based body posture automatic assessment method and system
CN117456612A (en) * 2023-12-26 2024-01-26 西安龙南铭科技有限公司 Cloud computing-based body posture automatic assessment method and system
CN117573655B (en) * 2024-01-15 2024-03-12 中国标准化研究院 Data management optimization method and system based on convolutional neural network
CN117573655A (en) * 2024-01-15 2024-02-20 中国标准化研究院 Data management optimization method and system based on convolutional neural network
CN117675112B (en) * 2024-02-01 2024-05-03 阳光凯讯(北京)科技股份有限公司 Communication signal processing method, system, equipment and medium based on machine learning
CN117675112A (en) * 2024-02-01 2024-03-08 阳光凯讯(北京)科技股份有限公司 Communication signal processing method, system, equipment and medium based on machine learning
CN117726977A (en) * 2024-02-07 2024-03-19 南京百伦斯智能科技有限公司 Experimental operation key node scoring method and system based on DCNN
CN117726977B (en) * 2024-02-07 2024-04-12 南京百伦斯智能科技有限公司 Experimental operation key node scoring method and system based on DCNN

Also Published As

Publication number Publication date
CN111881705B (en) 2023-12-12
CN111881705A (en) 2020-11-03

Similar Documents

Publication Publication Date Title
WO2021057810A1 (en) Data processing method, data training method, data identifying method and device, and storage medium
CN110472604B (en) Pedestrian and crowd behavior identification method based on video
CN103127691B (en) Video-generating device and method
JP7160932B2 (en) Generating prescriptive analytics using motion identification and motion information
CN110544301A (en) Three-dimensional human body action reconstruction system, method and action training system
CN110298279A (en) A kind of limb rehabilitation training householder method and system, medium, equipment
CN109214366A (en) Localized target recognition methods, apparatus and system again
CN110210284A (en) A kind of human body attitude behavior intelligent Evaluation method
Chiu et al. Emotion recognition through gait on mobile devices
Li et al. Abnormal sitting posture recognition based on multi-scale spatiotemporal features of skeleton graph
CN110443150A (en) A kind of fall detection method, device, storage medium
CN114998983A (en) Limb rehabilitation method based on augmented reality technology and posture recognition technology
Yadav et al. YogNet: A two-stream network for realtime multiperson yoga action recognition and posture correction
Sheu et al. Improvement of human pose estimation and processing with the intensive feature consistency network
Nasir et al. ENGA: Elastic net-based genetic algorithm for human action recognition
CN114565976A (en) Training intelligent test method and device
CN112149602A (en) Action counting method and device, electronic equipment and storage medium
CN116543455A (en) Method, equipment and medium for establishing parkinsonism gait damage assessment model and using same
CN115006822A (en) Intelligent fitness mirror control system
Ascenso Development of a non-invasive motion capture system for swimming biomechanics
Cubero et al. Multimodal Human Pose feature fusion for Gait recognition
Pinčić Gait recognition using a self-supervised self-attention deep learning model
CN115205983B (en) Cross-perspective gait recognition method, system and equipment based on multi-feature aggregation
KR102670939B1 (en) Sportrs posture evaluation system and method using sensor data
Chamola et al. Advancements in Yoga Pose Estimation Using Artificial Intelligence: A Survey

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20867912

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20867912

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 27.09.2022)

122 Ep: pct application non-entry in european phase

Ref document number: 20867912

Country of ref document: EP

Kind code of ref document: A1