CN116935494A

CN116935494A - Multi-person sitting posture identification method based on lightweight network model

Info

Publication number: CN116935494A
Application number: CN202311187857.4A
Authority: CN
Inventors: 周柚; 焦树扬; 于昆平; 曹政; 吴翾; 肖钰彬; 王镠璞; 杜伟
Original assignee: Jilin University
Current assignee: Jilin University
Priority date: 2023-09-15
Filing date: 2023-09-15
Publication date: 2023-10-24
Anticipated expiration: 2043-09-15
Also published as: CN116935494B

Abstract

The invention is applicable to the technical field of image processing, and provides a multi-person sitting posture identification method based on a lightweight network model, which comprises the following steps: processing the acquired video stream data, extracting key points by using OpenPose, connecting the obtained key points of the human body by a single person, carrying out normalization pretreatment by using a boundary frame, constructing a human body skeleton diagram by using a minimum spanning tree algorithm, and marking a data set after processing the human body skeleton diagram; enhancing the processed image and expanding the data set; constructing an LMSPNet model based on a lightweight convolutional neural network; training the LMSPNet model to generate a training model; the training model is tested. The invention realizes the automatic prediction and evaluation of the sitting postures of multiple persons, can successfully classify and judge the sitting postures of the users by automatically predicting the key points of the human bodies and generating the human body skeleton diagram, helps the users to find and correct the wrong sitting postures in time, and prevents potential injuries.

Description

Multi-person sitting posture identification method based on lightweight network model

Technical Field

The invention belongs to the technical field of image processing, and particularly relates to a multi-person sitting posture identification method based on a lightweight network model.

Background

Human body sitting posture recognition mainly aims at researching and describing human body gestures and predicting human body behaviors, and human body gesture recognition algorithms mainly comprise a depth map-based algorithm and an RGB image-based algorithm. The application of the algorithm of the depth map is easy to be limited due to the requirement of acquisition equipment, but the algorithm based on the RGB image directly recognizes the colors obtained by the change of the red, green and blue 3 color channels and the superposition of the red, green and blue 3 color channels, and the color channels cannot be limited by the interference of other factors, so the algorithm has a development prospect. Even in a complex and fixed scene, the human body posture estimation algorithm based on the RGB image can achieve a good recognition effect.

Identifying and analyzing sitting postures is critical to providing accurate health guidance and learning assistance. At present, two main methods exist for identifying sitting postures: the contact method uses a sensing-based technology to collect human sitting posture data, analyzes characteristic attributes and sensor parameters to identify, but is limited by low portability and high cost. The non-contact method mainly analyzes, marks and classifies the sitting postures of the human body through computer vision image processing. However, this approach relies on a large amount of annotated training data to accurately recognize and classify different gestures. Computer vision-based sitting posture recognition is to extract key points of a human body from images photographed by a camera and construct a body skeleton feature map, and then use image classification to recognize the sitting posture. Most of the sitting posture recognition methods use complex neural network models. However, deploying deep learning models on mobile devices or embedded systems remains a challenge due to the large amount of computing resources and storage space required for convolution operations. Therefore, we propose a multi-person sitting posture recognition method based on a lightweight network model.

Disclosure of Invention

The invention aims to provide a multi-person sitting posture recognition method based on a lightweight network model, and aims to solve the problems in the background technology.

In order to achieve the above purpose, the present invention provides the following technical solutions:

a multi-person sitting posture recognition method based on a lightweight network model comprises the following steps:

step A, converting collected video stream data from an MOV format to a PNG format, continuously selecting frame images with high correlation similarity before and after from a video stream sequence, storing the images once every set frames, extracting key points by using OpenPose, connecting the obtained key points of a human body by a single person, carrying out normalization pretreatment by using a boundary frame, constructing a human body skeleton diagram by using a minimum spanning tree algorithm, and marking a data set after the human body skeleton diagram is processed;

step B, enhancing the image processed in the step A, expanding the data set, and dividing the expanded data set into three parts, namely a training set, a verification set and a test set;

step C, constructing an LMSPNet model based on a lightweight convolutional neural network;

step D, training the LMSPNet model by using the training set and the verification set obtained in the step B to generate a training model;

and E, testing the training model generated in the step D by using the test set obtained in the step B.

Further, in the step a, a human skeleton diagram is constructed by using a minimum spanning tree algorithm, pixel values are divided into [0, 0] and [255,255 ], pictures which are cut into 224pix x 224pix are cut, and the data set is marked.

In step B, the image processed in step a is enhanced, and all image data are processed by means of horizontal overturning, rotation by different angles and scaling, so as to enlarge the size of the data set.

Further, the LMSPNet model comprises a light convolution core feature extraction module, an attention mechanism module and a sitting posture classifier module; the input of the LMSPNet model is PNG image, the input image is subjected to feature extraction through a lightweight feature extraction module, the extracted features are sent to an attention mechanism module for edge refinement to obtain final features, and the final extracted features are sent to a sitting posture classifier to obtain final predicted sitting posture categories.

Further, in the lightweight convolution kernel feature extraction module, input data is processed through one convolution layer first, and channels are mapped to 64 dimensions; then, the PReLU activation function is adopted to enhance the expression capacity and the robustness of the model; then performing feature dimension reduction by using a maximum pooling operation; when calculating the output shape, calculating by using an upward rounding method; extracting features using two Light Fire blocks; feature dimension reduction to 27 x 128 again using a max pooling operation; then extracting features using two Light Fire blocks and a max pooling operation, converting the features into a shape of 13 x 256; finally, performing final feature extraction by using four Light Fire blocks to obtain 13 x 512 feature representations;

in the attention mechanism module, introducing CBAM as the attention mechanism module; the CBAM module comprises a CAM and a SAM, wherein the CAM enhances the representation capability of the feature by calculating the importance of each channel, and the SAM enhances the representation capability of the feature by calculating the importance of each position;

assuming that an image of an initial shape of an input CNN feature is w×h×c is denoted as F; first obtaining channel attention weights Mc in the shape of 1 x c using CAM; multiplying Mc and F to obtain a characteristic representation F' with the same shape as F; SAM is then applied to F' to obtain a spatial attention weight Ms of shape w×h×1; finally multiplying Ms with F ' to obtain a final characteristic representation F ' ' with the shape W.times.H.times.C; inputting the final characteristics into a sitting posture classifier to perform a specific sitting posture classification task;

in the sitting classifier module, dropout operation is firstly carried out on the final characteristics; the number of channels is reduced to 3 dimensions by a convolution layer of 1*1; nonlinear mapping is carried out by adopting a PReLU activation function; then reducing the space dimension of the feature map to 1 x 3 through an averaging pooling operation; converting the input into a probability distribution containing three probability values by using a softMax function, wherein the probability of each sitting posture category is represented; and finally, converting the feature extraction layer and the global feature representation into a final predicted sitting posture category through a full connection layer.

Further, the step D specifically includes:

d1, training the LMSPNet by using the training set obtained in the step B, and setting the initial learning rate to 0.0001 and the batch_size to 16; the loss function is a cross entropy loss function and the optimizer is an SGD optimizer.

Further, the step E specifically includes:

and E1, testing the trained model by using the test set obtained in the step B, predicting three sitting postures and judging whether the model is in a standard sitting posture or not.

Compared with the prior art, the invention has the beneficial effects that:

according to the multi-person sitting posture recognition method based on the light-weight network model, an LMSPNet model comprising a light-weight convolution core feature extraction module, an attention mechanism module and a sitting posture classifier module is arranged, the features of an image are extracted through the light-weight feature extraction module, and then the extracted features are sent to the attention mechanism module for edge refinement to obtain final features; and sending the finally extracted features into a sitting posture classifier module to obtain the final classified sitting posture category, and finally recording the current sitting posture category of the user. The LMSPNet fuses the local features and the global features with each other with lower parameter quantity and faster reasoning speed to obtain more scale and richer features, and finally accurate landmark positioning is realized in the sitting posture classifier module; the invention can accurately predict the sitting postures, reduces the cost of manual classification and provides a powerful auxiliary means for estimating the correct sitting postures.

Drawings

Fig. 1 is a flow chart of the method.

Fig. 2 is a block diagram of LMSPNet.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

Specific implementations of the invention are described in detail below in connection with specific embodiments.

As shown in fig. 1 and fig. 2, the method for identifying sitting postures of multiple persons based on a lightweight network model according to an embodiment of the invention includes the following steps:

In the step a, a human skeleton diagram is constructed by using a minimum spanning tree algorithm, the pixel values are divided into [0, 0] and [255,255 ], and a picture cut into 224pix x 224pix is cut, and the dataset is marked.

In an embodiment of the present invention, it is preferable to construct a human skeleton graph using a minimum spanning tree algorithm and divide pixel values into [0, 0] (background) and [255,255 ] (human skeleton).

In the step B, the image processed in the step a is enhanced, and all image data are processed by means of horizontal flipping, rotation by different angles and scaling, so as to enlarge the size of the data set.

In the embodiment of the invention, preferably, the lightweight network model can effectively expand the image data available for training by a method for enhancing the data set. By performing operations such as horizontal overturning, rotation, scaling and the like on the image, more diversified training data can be generated, so that the robustness and generalization capability of the model are improved. The method for expanding the data set is beneficial to increasing the diversity of samples, so that the model can be better adapted to the sitting situations with different postures and angles. Meanwhile, the data set is expanded, so that the problem of data unbalance can be solved, and the performance of the model in the task of identifying sitting postures of multiple people is improved. By using the lightweight network model and the method for enhancing the data set, the accuracy and the robustness of the multi-person sitting posture recognition system can be improved, and a more reliable and effective sitting posture monitoring and correcting scheme is provided for people.

As a preferred embodiment of the present invention, the LMSPNet model includes a lightweight convolution kernel feature extraction module, an attention mechanism module, and a sitting posture classifier module; the input of the LMSPNet model is PNG image, the input image is subjected to feature extraction through a lightweight feature extraction module, the extracted features are sent to an attention mechanism module for edge refinement to obtain final features, and the final extracted features are sent to a sitting posture classifier to obtain final predicted sitting posture categories.

In the lightweight convolution kernel feature extraction module, as a preferred embodiment of the present invention, input data is first processed through one convolution layer, mapping channels to 64 dimensions; then, the PReLU activation function is adopted to enhance the expression capacity and the robustness of the model; then performing feature dimension reduction by using a maximum pooling operation; when calculating the output shape, calculating by using an upward rounding method; extracting features using two Light Fire blocks; feature dimension reduction to 27 x 128 again using a max pooling operation; then extracting features using two Light Fire blocks and a max pooling operation, converting the features into a shape of 13 x 256; finally, performing final feature extraction by using four Light Fire blocks to obtain 13 x 512 feature representations;

In the embodiment of the present invention, preferably, in the lightweight convolution kernel feature extraction module, input data is first processed through a convolution layer, where the convolution kernel of the convolution layer has a size of 3*3 and a step size of 2, and the channel is mapped to 64 dimensions. Feature dimension reduction is performed using a max pooling operation, where the size of the pooling kernel is 3*3, step size is 2. In calculating the output shape, calculation is performed using a method of rounding up (ceil_model=true). Extracting features by using two Light Fire blocks; the Light Fire block is composed of an extrusion layer and an expansion layer, wherein the extrusion layer adopts a binding convolution of 1*1, the expansion layer adopts binding convolutions of 1*1 and 3*3, and the binding convolution is characterized in that one half convolution is used on three RGB channels to simultaneously carry out convolution scanning, so that three activation maps are generated.

In the attention mechanism module, CBAM (Convolutional Block Attention Module) is introduced as an attention mechanism module to enhance the performance of the convolutional neural network in sitting posture recognition in order to improve the expressive power of information and focus on position information without increasing parameters. The feature representation capability of the network can be improved by introducing the CBAM after the convolution layer, so that the recognition accuracy of different sitting postures is improved. The CBAM module consists of two parts: channel Attention Mechanisms (CAMs) and Spatial Attention Mechanisms (SAM). Assuming that an image of an initial shape of an input CNN feature is w×h×c is denoted as F; first, a channel attention weight Mc (F) having a shape of 1×1×c is obtained using CAM (Mc). Then, mc (F) is multiplied by F to obtain a characteristic representation F' having the same shape as F. SAM (Ms) is then applied to F 'to obtain spatial attention weight Ms (F') in the shape w×h×1. Finally, m_s (F ') is multiplied by F' to obtain the final characteristic representation f″ having the shape w×h×c. This results in an advanced representation of the features that can be input into the sitting classifier for specific sitting classification tasks.

In the sitting classifier module, the final features are first Dropout operated to prevent overfitting.

As a preferred embodiment of the present invention, the step D specifically includes:

As a preferred embodiment of the present invention, the step E specifically includes:

The invention provides a multi-person sitting posture identification method based on a lightweight network model, which comprises the following steps of:

A. image preprocessing

According to the invention, 20 samples are collected, the sitting posture category of each sample contains 30s with different time length, firstly, the samples are converted into PNG images from MOV format in batches through an OpenCV library in Python, frame images with large correlation similarity before and after are continuously selected from video stream sequences, the images are stored once every set frames, the OpenPose is used for extracting key points, the obtained key points of a human body are connected by a single person, the boundary frame normalization preprocessing is used, the minimum spanning tree algorithm is used for constructing a human body skeleton diagram, and pixel values are divided into [0, 0] (background) and [255,255 ] (human body skeleton). A picture cut to 224pix x 224pix; the poorly imaged samples were removed, as per 16:4:5, dividing the sample into a training set, a verification set and a test set according to the proportion, and finally obtaining: training set 4429, validation set 1107, test set 1386.

Enhancement of the original data: (a) rotating by-10-10 degrees with a random probability of 0.5; (b) flipping with a probability level of 0.5; (c) randomly scaling 90% of the original length and width with a probability of 0.5.

B. Network construction and training

Constructing a CNN-based network, wherein the input of the LMSPNet is a PNG image, the input image is extracted to obtain final characteristics through a lightweight core characteristic extraction and attention mechanism module, and then the characteristics are sent to a sitting posture classifier module to obtain the final predicted sitting posture category.

C. Training of a network

Using the training set obtained in a, LMSPNet was trained, setting initial learning rate 0.0001,batch size to 16. The loss function is set as a cross entropy loss function and the optimizer selects an SGD optimizer. Setting training epoch as 40, setting dynamic learning rate, wherein the learning rate is stepwise reduced according to the increase of training times to [0.0001,0.00001,0.000001], the corresponding training epoch is [20,30,35], and once for each training round, verification is performed, and when the test effect of the verification set reaches convergence, the training is stopped.

D. The test data verifies the trained model and determines the test effect

And predicting the sitting posture type of the test set. Loading the model and the weight stored in the training stage, inputting test data into the trained model to obtain a test result, including the probability of sitting posture category, and calculating the category of the model for predicting the sitting posture of the human body according to the prediction probability given by the model.

The foregoing is merely a preferred embodiment of the present invention, and it should be noted that modifications and improvements can be made by those skilled in the art without departing from the spirit of the present invention, and these should also be considered as the scope of the present invention, which does not affect the effect of the implementation of the present invention and the utility of the patent.

Claims

1. The multi-person sitting posture recognition method based on the lightweight network model is characterized by comprising the following steps of:

2. A multi-person sitting posture recognition method based on a lightweight network model according to claim 1, wherein in the step a, a human skeleton diagram is constructed by using a minimum spanning tree algorithm, pixel values are divided into [0, 0] and [255,255 ], and a picture cut into 224pix x 224pix is used for marking the dataset.

3. The method for recognizing sitting postures of multiple persons based on lightweight network model according to claim 1 wherein in step B, the image processed in step a is enhanced, and all image data is processed by means of horizontal overturn, rotation by different angles and scaling to enlarge the size of the data set.

4. The multi-person sitting posture recognition method based on the lightweight network model according to claim 1, wherein the LMSPNet model comprises a lightweight convolution kernel feature extraction module, an attention mechanism module and a sitting posture classifier module; the input of the LMSPNet model is PNG image, the input image is subjected to feature extraction through a lightweight feature extraction module, the extracted features are sent to an attention mechanism module for edge refinement to obtain final features, and the final extracted features are sent to a sitting posture classifier to obtain final predicted sitting posture categories.

5. The method for identifying the sitting postures of multiple persons based on the lightweight network model according to claim 4, wherein in the lightweight convolution kernel feature extraction module, input data is firstly processed through one convolution layer, and channels are mapped to 64 dimensions; then, the PReLU activation function is adopted to enhance the expression capacity and the robustness of the model; then performing feature dimension reduction by using a maximum pooling operation; when calculating the output shape, calculating by using an upward rounding method; extracting features using two Light Fire blocks; feature dimension reduction to 27 x 128 again using a max pooling operation; then extracting features using two Light Fire blocks and a max pooling operation, converting the features into a shape of 13 x 256; finally, performing final feature extraction by using four Light Fire blocks to obtain 13 x 512 feature representations;

6. The method for identifying a multi-person sitting posture based on a lightweight network model according to claim 1, wherein the step D is specifically:

7. The method for identifying a multi-person sitting posture based on a lightweight network model according to claim 1, wherein the step E is specifically: