CN116935494A - Multi-person sitting posture identification method based on lightweight network model - Google Patents

Multi-person sitting posture identification method based on lightweight network model Download PDF

Info

Publication number
CN116935494A
CN116935494A CN202311187857.4A CN202311187857A CN116935494A CN 116935494 A CN116935494 A CN 116935494A CN 202311187857 A CN202311187857 A CN 202311187857A CN 116935494 A CN116935494 A CN 116935494A
Authority
CN
China
Prior art keywords
sitting posture
model
lightweight
training
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311187857.4A
Other languages
Chinese (zh)
Other versions
CN116935494B (en
Inventor
周柚
焦树扬
于昆平
曹政
吴翾
肖钰彬
王镠璞
杜伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jilin University
Original Assignee
Jilin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jilin University filed Critical Jilin University
Priority to CN202311187857.4A priority Critical patent/CN116935494B/en
Publication of CN116935494A publication Critical patent/CN116935494A/en
Application granted granted Critical
Publication of CN116935494B publication Critical patent/CN116935494B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Abstract

The invention is applicable to the technical field of image processing, and provides a multi-person sitting posture identification method based on a lightweight network model, which comprises the following steps: processing the acquired video stream data, extracting key points by using OpenPose, connecting the obtained key points of the human body by a single person, carrying out normalization pretreatment by using a boundary frame, constructing a human body skeleton diagram by using a minimum spanning tree algorithm, and marking a data set after processing the human body skeleton diagram; enhancing the processed image and expanding the data set; constructing an LMSPNet model based on a lightweight convolutional neural network; training the LMSPNet model to generate a training model; the training model is tested. The invention realizes the automatic prediction and evaluation of the sitting postures of multiple persons, can successfully classify and judge the sitting postures of the users by automatically predicting the key points of the human bodies and generating the human body skeleton diagram, helps the users to find and correct the wrong sitting postures in time, and prevents potential injuries.

Description

Multi-person sitting posture identification method based on lightweight network model
Technical Field
The invention belongs to the technical field of image processing, and particularly relates to a multi-person sitting posture identification method based on a lightweight network model.
Background
Human body sitting posture recognition mainly aims at researching and describing human body gestures and predicting human body behaviors, and human body gesture recognition algorithms mainly comprise a depth map-based algorithm and an RGB image-based algorithm. The application of the algorithm of the depth map is easy to be limited due to the requirement of acquisition equipment, but the algorithm based on the RGB image directly recognizes the colors obtained by the change of the red, green and blue 3 color channels and the superposition of the red, green and blue 3 color channels, and the color channels cannot be limited by the interference of other factors, so the algorithm has a development prospect. Even in a complex and fixed scene, the human body posture estimation algorithm based on the RGB image can achieve a good recognition effect.
Identifying and analyzing sitting postures is critical to providing accurate health guidance and learning assistance. At present, two main methods exist for identifying sitting postures: the contact method uses a sensing-based technology to collect human sitting posture data, analyzes characteristic attributes and sensor parameters to identify, but is limited by low portability and high cost. The non-contact method mainly analyzes, marks and classifies the sitting postures of the human body through computer vision image processing. However, this approach relies on a large amount of annotated training data to accurately recognize and classify different gestures. Computer vision-based sitting posture recognition is to extract key points of a human body from images photographed by a camera and construct a body skeleton feature map, and then use image classification to recognize the sitting posture. Most of the sitting posture recognition methods use complex neural network models. However, deploying deep learning models on mobile devices or embedded systems remains a challenge due to the large amount of computing resources and storage space required for convolution operations. Therefore, we propose a multi-person sitting posture recognition method based on a lightweight network model.
Disclosure of Invention
The invention aims to provide a multi-person sitting posture recognition method based on a lightweight network model, and aims to solve the problems in the background technology.
In order to achieve the above purpose, the present invention provides the following technical solutions:
a multi-person sitting posture recognition method based on a lightweight network model comprises the following steps:
step A, converting collected video stream data from an MOV format to a PNG format, continuously selecting frame images with high correlation similarity before and after from a video stream sequence, storing the images once every set frames, extracting key points by using OpenPose, connecting the obtained key points of a human body by a single person, carrying out normalization pretreatment by using a boundary frame, constructing a human body skeleton diagram by using a minimum spanning tree algorithm, and marking a data set after the human body skeleton diagram is processed;
step B, enhancing the image processed in the step A, expanding the data set, and dividing the expanded data set into three parts, namely a training set, a verification set and a test set;
step C, constructing an LMSPNet model based on a lightweight convolutional neural network;
step D, training the LMSPNet model by using the training set and the verification set obtained in the step B to generate a training model;
and E, testing the training model generated in the step D by using the test set obtained in the step B.
Further, in the step a, a human skeleton diagram is constructed by using a minimum spanning tree algorithm, pixel values are divided into [0, 0] and [255,255 ], pictures which are cut into 224pix x 224pix are cut, and the data set is marked.
In step B, the image processed in step a is enhanced, and all image data are processed by means of horizontal overturning, rotation by different angles and scaling, so as to enlarge the size of the data set.
Further, the LMSPNet model comprises a light convolution core feature extraction module, an attention mechanism module and a sitting posture classifier module; the input of the LMSPNet model is PNG image, the input image is subjected to feature extraction through a lightweight feature extraction module, the extracted features are sent to an attention mechanism module for edge refinement to obtain final features, and the final extracted features are sent to a sitting posture classifier to obtain final predicted sitting posture categories.
Further, in the lightweight convolution kernel feature extraction module, input data is processed through one convolution layer first, and channels are mapped to 64 dimensions; then, the PReLU activation function is adopted to enhance the expression capacity and the robustness of the model; then performing feature dimension reduction by using a maximum pooling operation; when calculating the output shape, calculating by using an upward rounding method; extracting features using two Light Fire blocks; feature dimension reduction to 27 x 128 again using a max pooling operation; then extracting features using two Light Fire blocks and a max pooling operation, converting the features into a shape of 13 x 256; finally, performing final feature extraction by using four Light Fire blocks to obtain 13 x 512 feature representations;
in the attention mechanism module, introducing CBAM as the attention mechanism module; the CBAM module comprises a CAM and a SAM, wherein the CAM enhances the representation capability of the feature by calculating the importance of each channel, and the SAM enhances the representation capability of the feature by calculating the importance of each position;
assuming that an image of an initial shape of an input CNN feature is w×h×c is denoted as F; first obtaining channel attention weights Mc in the shape of 1 x c using CAM; multiplying Mc and F to obtain a characteristic representation F' with the same shape as F; SAM is then applied to F' to obtain a spatial attention weight Ms of shape w×h×1; finally multiplying Ms with F ' to obtain a final characteristic representation F ' ' with the shape W.times.H.times.C; inputting the final characteristics into a sitting posture classifier to perform a specific sitting posture classification task;
in the sitting classifier module, dropout operation is firstly carried out on the final characteristics; the number of channels is reduced to 3 dimensions by a convolution layer of 1*1; nonlinear mapping is carried out by adopting a PReLU activation function; then reducing the space dimension of the feature map to 1 x 3 through an averaging pooling operation; converting the input into a probability distribution containing three probability values by using a softMax function, wherein the probability of each sitting posture category is represented; and finally, converting the feature extraction layer and the global feature representation into a final predicted sitting posture category through a full connection layer.
Further, the step D specifically includes:
d1, training the LMSPNet by using the training set obtained in the step B, and setting the initial learning rate to 0.0001 and the batch_size to 16; the loss function is a cross entropy loss function and the optimizer is an SGD optimizer.
Further, the step E specifically includes:
and E1, testing the trained model by using the test set obtained in the step B, predicting three sitting postures and judging whether the model is in a standard sitting posture or not.
Compared with the prior art, the invention has the beneficial effects that:
according to the multi-person sitting posture recognition method based on the light-weight network model, an LMSPNet model comprising a light-weight convolution core feature extraction module, an attention mechanism module and a sitting posture classifier module is arranged, the features of an image are extracted through the light-weight feature extraction module, and then the extracted features are sent to the attention mechanism module for edge refinement to obtain final features; and sending the finally extracted features into a sitting posture classifier module to obtain the final classified sitting posture category, and finally recording the current sitting posture category of the user. The LMSPNet fuses the local features and the global features with each other with lower parameter quantity and faster reasoning speed to obtain more scale and richer features, and finally accurate landmark positioning is realized in the sitting posture classifier module; the invention can accurately predict the sitting postures, reduces the cost of manual classification and provides a powerful auxiliary means for estimating the correct sitting postures.
Drawings
Fig. 1 is a flow chart of the method.
Fig. 2 is a block diagram of LMSPNet.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Specific implementations of the invention are described in detail below in connection with specific embodiments.
As shown in fig. 1 and fig. 2, the method for identifying sitting postures of multiple persons based on a lightweight network model according to an embodiment of the invention includes the following steps:
step A, converting collected video stream data from an MOV format to a PNG format, continuously selecting frame images with high correlation similarity before and after from a video stream sequence, storing the images once every set frames, extracting key points by using OpenPose, connecting the obtained key points of a human body by a single person, carrying out normalization pretreatment by using a boundary frame, constructing a human body skeleton diagram by using a minimum spanning tree algorithm, and marking a data set after the human body skeleton diagram is processed;
step B, enhancing the image processed in the step A, expanding the data set, and dividing the expanded data set into three parts, namely a training set, a verification set and a test set;
step C, constructing an LMSPNet model based on a lightweight convolutional neural network;
step D, training the LMSPNet model by using the training set and the verification set obtained in the step B to generate a training model;
and E, testing the training model generated in the step D by using the test set obtained in the step B.
In the step a, a human skeleton diagram is constructed by using a minimum spanning tree algorithm, the pixel values are divided into [0, 0] and [255,255 ], and a picture cut into 224pix x 224pix is cut, and the dataset is marked.
In an embodiment of the present invention, it is preferable to construct a human skeleton graph using a minimum spanning tree algorithm and divide pixel values into [0, 0] (background) and [255,255 ] (human skeleton).
In the step B, the image processed in the step a is enhanced, and all image data are processed by means of horizontal flipping, rotation by different angles and scaling, so as to enlarge the size of the data set.
In the embodiment of the invention, preferably, the lightweight network model can effectively expand the image data available for training by a method for enhancing the data set. By performing operations such as horizontal overturning, rotation, scaling and the like on the image, more diversified training data can be generated, so that the robustness and generalization capability of the model are improved. The method for expanding the data set is beneficial to increasing the diversity of samples, so that the model can be better adapted to the sitting situations with different postures and angles. Meanwhile, the data set is expanded, so that the problem of data unbalance can be solved, and the performance of the model in the task of identifying sitting postures of multiple people is improved. By using the lightweight network model and the method for enhancing the data set, the accuracy and the robustness of the multi-person sitting posture recognition system can be improved, and a more reliable and effective sitting posture monitoring and correcting scheme is provided for people.
As a preferred embodiment of the present invention, the LMSPNet model includes a lightweight convolution kernel feature extraction module, an attention mechanism module, and a sitting posture classifier module; the input of the LMSPNet model is PNG image, the input image is subjected to feature extraction through a lightweight feature extraction module, the extracted features are sent to an attention mechanism module for edge refinement to obtain final features, and the final extracted features are sent to a sitting posture classifier to obtain final predicted sitting posture categories.
In the lightweight convolution kernel feature extraction module, as a preferred embodiment of the present invention, input data is first processed through one convolution layer, mapping channels to 64 dimensions; then, the PReLU activation function is adopted to enhance the expression capacity and the robustness of the model; then performing feature dimension reduction by using a maximum pooling operation; when calculating the output shape, calculating by using an upward rounding method; extracting features using two Light Fire blocks; feature dimension reduction to 27 x 128 again using a max pooling operation; then extracting features using two Light Fire blocks and a max pooling operation, converting the features into a shape of 13 x 256; finally, performing final feature extraction by using four Light Fire blocks to obtain 13 x 512 feature representations;
in the attention mechanism module, introducing CBAM as the attention mechanism module; the CBAM module comprises a CAM and a SAM, wherein the CAM enhances the representation capability of the feature by calculating the importance of each channel, and the SAM enhances the representation capability of the feature by calculating the importance of each position;
assuming that an image of an initial shape of an input CNN feature is w×h×c is denoted as F; first obtaining channel attention weights Mc in the shape of 1 x c using CAM; multiplying Mc and F to obtain a characteristic representation F' with the same shape as F; SAM is then applied to F' to obtain a spatial attention weight Ms of shape w×h×1; finally multiplying Ms with F ' to obtain a final characteristic representation F ' ' with the shape W.times.H.times.C; inputting the final characteristics into a sitting posture classifier to perform a specific sitting posture classification task;
in the sitting classifier module, dropout operation is firstly carried out on the final characteristics; the number of channels is reduced to 3 dimensions by a convolution layer of 1*1; nonlinear mapping is carried out by adopting a PReLU activation function; then reducing the space dimension of the feature map to 1 x 3 through an averaging pooling operation; converting the input into a probability distribution containing three probability values by using a softMax function, wherein the probability of each sitting posture category is represented; and finally, converting the feature extraction layer and the global feature representation into a final predicted sitting posture category through a full connection layer.
In the embodiment of the present invention, preferably, in the lightweight convolution kernel feature extraction module, input data is first processed through a convolution layer, where the convolution kernel of the convolution layer has a size of 3*3 and a step size of 2, and the channel is mapped to 64 dimensions. Feature dimension reduction is performed using a max pooling operation, where the size of the pooling kernel is 3*3, step size is 2. In calculating the output shape, calculation is performed using a method of rounding up (ceil_model=true). Extracting features by using two Light Fire blocks; the Light Fire block is composed of an extrusion layer and an expansion layer, wherein the extrusion layer adopts a binding convolution of 1*1, the expansion layer adopts binding convolutions of 1*1 and 3*3, and the binding convolution is characterized in that one half convolution is used on three RGB channels to simultaneously carry out convolution scanning, so that three activation maps are generated.
In the attention mechanism module, CBAM (Convolutional Block Attention Module) is introduced as an attention mechanism module to enhance the performance of the convolutional neural network in sitting posture recognition in order to improve the expressive power of information and focus on position information without increasing parameters. The feature representation capability of the network can be improved by introducing the CBAM after the convolution layer, so that the recognition accuracy of different sitting postures is improved. The CBAM module consists of two parts: channel Attention Mechanisms (CAMs) and Spatial Attention Mechanisms (SAM). Assuming that an image of an initial shape of an input CNN feature is w×h×c is denoted as F; first, a channel attention weight Mc (F) having a shape of 1×1×c is obtained using CAM (Mc). Then, mc (F) is multiplied by F to obtain a characteristic representation F' having the same shape as F. SAM (Ms) is then applied to F 'to obtain spatial attention weight Ms (F') in the shape w×h×1. Finally, m_s (F ') is multiplied by F' to obtain the final characteristic representation f″ having the shape w×h×c. This results in an advanced representation of the features that can be input into the sitting classifier for specific sitting classification tasks.
In the sitting classifier module, the final features are first Dropout operated to prevent overfitting.
As a preferred embodiment of the present invention, the step D specifically includes:
d1, training the LMSPNet by using the training set obtained in the step B, and setting the initial learning rate to 0.0001 and the batch_size to 16; the loss function is a cross entropy loss function and the optimizer is an SGD optimizer.
As a preferred embodiment of the present invention, the step E specifically includes:
and E1, testing the trained model by using the test set obtained in the step B, predicting three sitting postures and judging whether the model is in a standard sitting posture or not.
The invention provides a multi-person sitting posture identification method based on a lightweight network model, which comprises the following steps of:
A. image preprocessing
According to the invention, 20 samples are collected, the sitting posture category of each sample contains 30s with different time length, firstly, the samples are converted into PNG images from MOV format in batches through an OpenCV library in Python, frame images with large correlation similarity before and after are continuously selected from video stream sequences, the images are stored once every set frames, the OpenPose is used for extracting key points, the obtained key points of a human body are connected by a single person, the boundary frame normalization preprocessing is used, the minimum spanning tree algorithm is used for constructing a human body skeleton diagram, and pixel values are divided into [0, 0] (background) and [255,255 ] (human body skeleton). A picture cut to 224pix x 224pix; the poorly imaged samples were removed, as per 16:4:5, dividing the sample into a training set, a verification set and a test set according to the proportion, and finally obtaining: training set 4429, validation set 1107, test set 1386.
Enhancement of the original data: (a) rotating by-10-10 degrees with a random probability of 0.5; (b) flipping with a probability level of 0.5; (c) randomly scaling 90% of the original length and width with a probability of 0.5.
B. Network construction and training
Constructing a CNN-based network, wherein the input of the LMSPNet is a PNG image, the input image is extracted to obtain final characteristics through a lightweight core characteristic extraction and attention mechanism module, and then the characteristics are sent to a sitting posture classifier module to obtain the final predicted sitting posture category.
C. Training of a network
Using the training set obtained in a, LMSPNet was trained, setting initial learning rate 0.0001,batch size to 16. The loss function is set as a cross entropy loss function and the optimizer selects an SGD optimizer. Setting training epoch as 40, setting dynamic learning rate, wherein the learning rate is stepwise reduced according to the increase of training times to [0.0001,0.00001,0.000001], the corresponding training epoch is [20,30,35], and once for each training round, verification is performed, and when the test effect of the verification set reaches convergence, the training is stopped.
D. The test data verifies the trained model and determines the test effect
And predicting the sitting posture type of the test set. Loading the model and the weight stored in the training stage, inputting test data into the trained model to obtain a test result, including the probability of sitting posture category, and calculating the category of the model for predicting the sitting posture of the human body according to the prediction probability given by the model.
The foregoing is merely a preferred embodiment of the present invention, and it should be noted that modifications and improvements can be made by those skilled in the art without departing from the spirit of the present invention, and these should also be considered as the scope of the present invention, which does not affect the effect of the implementation of the present invention and the utility of the patent.

Claims (7)

1. The multi-person sitting posture recognition method based on the lightweight network model is characterized by comprising the following steps of:
step A, converting collected video stream data from an MOV format to a PNG format, continuously selecting frame images with high correlation similarity before and after from a video stream sequence, storing the images once every set frames, extracting key points by using OpenPose, connecting the obtained key points of a human body by a single person, carrying out normalization pretreatment by using a boundary frame, constructing a human body skeleton diagram by using a minimum spanning tree algorithm, and marking a data set after the human body skeleton diagram is processed;
step B, enhancing the image processed in the step A, expanding the data set, and dividing the expanded data set into three parts, namely a training set, a verification set and a test set;
step C, constructing an LMSPNet model based on a lightweight convolutional neural network;
step D, training the LMSPNet model by using the training set and the verification set obtained in the step B to generate a training model;
and E, testing the training model generated in the step D by using the test set obtained in the step B.
2. A multi-person sitting posture recognition method based on a lightweight network model according to claim 1, wherein in the step a, a human skeleton diagram is constructed by using a minimum spanning tree algorithm, pixel values are divided into [0, 0] and [255,255 ], and a picture cut into 224pix x 224pix is used for marking the dataset.
3. The method for recognizing sitting postures of multiple persons based on lightweight network model according to claim 1 wherein in step B, the image processed in step a is enhanced, and all image data is processed by means of horizontal overturn, rotation by different angles and scaling to enlarge the size of the data set.
4. The multi-person sitting posture recognition method based on the lightweight network model according to claim 1, wherein the LMSPNet model comprises a lightweight convolution kernel feature extraction module, an attention mechanism module and a sitting posture classifier module; the input of the LMSPNet model is PNG image, the input image is subjected to feature extraction through a lightweight feature extraction module, the extracted features are sent to an attention mechanism module for edge refinement to obtain final features, and the final extracted features are sent to a sitting posture classifier to obtain final predicted sitting posture categories.
5. The method for identifying the sitting postures of multiple persons based on the lightweight network model according to claim 4, wherein in the lightweight convolution kernel feature extraction module, input data is firstly processed through one convolution layer, and channels are mapped to 64 dimensions; then, the PReLU activation function is adopted to enhance the expression capacity and the robustness of the model; then performing feature dimension reduction by using a maximum pooling operation; when calculating the output shape, calculating by using an upward rounding method; extracting features using two Light Fire blocks; feature dimension reduction to 27 x 128 again using a max pooling operation; then extracting features using two Light Fire blocks and a max pooling operation, converting the features into a shape of 13 x 256; finally, performing final feature extraction by using four Light Fire blocks to obtain 13 x 512 feature representations;
in the attention mechanism module, introducing CBAM as the attention mechanism module; the CBAM module comprises a CAM and a SAM, wherein the CAM enhances the representation capability of the feature by calculating the importance of each channel, and the SAM enhances the representation capability of the feature by calculating the importance of each position;
assuming that an image of an initial shape of an input CNN feature is w×h×c is denoted as F; first obtaining channel attention weights Mc in the shape of 1 x c using CAM; multiplying Mc and F to obtain a characteristic representation F' with the same shape as F; SAM is then applied to F' to obtain a spatial attention weight Ms of shape w×h×1; finally multiplying Ms with F ' to obtain a final characteristic representation F ' ' with the shape W.times.H.times.C; inputting the final characteristics into a sitting posture classifier to perform a specific sitting posture classification task;
in the sitting classifier module, dropout operation is firstly carried out on the final characteristics; the number of channels is reduced to 3 dimensions by a convolution layer of 1*1; nonlinear mapping is carried out by adopting a PReLU activation function; then reducing the space dimension of the feature map to 1 x 3 through an averaging pooling operation; converting the input into a probability distribution containing three probability values by using a softMax function, wherein the probability of each sitting posture category is represented; and finally, converting the feature extraction layer and the global feature representation into a final predicted sitting posture category through a full connection layer.
6. The method for identifying a multi-person sitting posture based on a lightweight network model according to claim 1, wherein the step D is specifically:
d1, training the LMSPNet by using the training set obtained in the step B, and setting the initial learning rate to 0.0001 and the batch_size to 16; the loss function is a cross entropy loss function and the optimizer is an SGD optimizer.
7. The method for identifying a multi-person sitting posture based on a lightweight network model according to claim 1, wherein the step E is specifically:
and E1, testing the trained model by using the test set obtained in the step B, predicting three sitting postures and judging whether the model is in a standard sitting posture or not.
CN202311187857.4A 2023-09-15 2023-09-15 Multi-person sitting posture identification method based on lightweight network model Active CN116935494B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311187857.4A CN116935494B (en) 2023-09-15 2023-09-15 Multi-person sitting posture identification method based on lightweight network model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311187857.4A CN116935494B (en) 2023-09-15 2023-09-15 Multi-person sitting posture identification method based on lightweight network model

Publications (2)

Publication Number Publication Date
CN116935494A true CN116935494A (en) 2023-10-24
CN116935494B CN116935494B (en) 2023-12-12

Family

ID=88386396

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311187857.4A Active CN116935494B (en) 2023-09-15 2023-09-15 Multi-person sitting posture identification method based on lightweight network model

Country Status (1)

Country Link
CN (1) CN116935494B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110532912A (en) * 2019-08-19 2019-12-03 合肥学院 A kind of sign language interpreter implementation method and device
CN112364712A (en) * 2020-10-21 2021-02-12 厦门大学 Human posture-based sitting posture identification method and system and computer-readable storage medium
CN113111760A (en) * 2021-04-07 2021-07-13 同济大学 Lightweight graph convolution human skeleton action identification method based on channel attention
CN115661943A (en) * 2022-12-22 2023-01-31 电子科技大学 Fall detection method based on lightweight attitude assessment network
CN116299427A (en) * 2022-11-30 2023-06-23 中国人民解放军空军工程大学 Attention mechanism-based lightweight ultra-wideband radar gesture recognition method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110532912A (en) * 2019-08-19 2019-12-03 合肥学院 A kind of sign language interpreter implementation method and device
CN112364712A (en) * 2020-10-21 2021-02-12 厦门大学 Human posture-based sitting posture identification method and system and computer-readable storage medium
CN113111760A (en) * 2021-04-07 2021-07-13 同济大学 Lightweight graph convolution human skeleton action identification method based on channel attention
CN116299427A (en) * 2022-11-30 2023-06-23 中国人民解放军空军工程大学 Attention mechanism-based lightweight ultra-wideband radar gesture recognition method
CN115661943A (en) * 2022-12-22 2023-01-31 电子科技大学 Fall detection method based on lightweight attitude assessment network

Also Published As

Publication number Publication date
CN116935494B (en) 2023-12-12

Similar Documents

Publication Publication Date Title
CN110443143B (en) Multi-branch convolutional neural network fused remote sensing image scene classification method
CN110348319B (en) Face anti-counterfeiting method based on face depth information and edge image fusion
CN111126258B (en) Image recognition method and related device
CN109697434B (en) Behavior recognition method and device and storage medium
CN110728209B (en) Gesture recognition method and device, electronic equipment and storage medium
CN110929569B (en) Face recognition method, device, equipment and storage medium
CN108280397B (en) Human body image hair detection method based on deep convolutional neural network
CN110555481A (en) Portrait style identification method and device and computer readable storage medium
CN111428664B (en) Computer vision real-time multi-person gesture estimation method based on deep learning technology
CN109903339B (en) Video group figure positioning detection method based on multi-dimensional fusion features
CN111368672A (en) Construction method and device for genetic disease facial recognition model
CN110097029B (en) Identity authentication method based on high way network multi-view gait recognition
CN113297956B (en) Gesture recognition method and system based on vision
CN104951793A (en) STDF (standard test data format) feature based human behavior recognition algorithm
CN112287802A (en) Face image detection method, system, storage medium and equipment
CN113487610B (en) Herpes image recognition method and device, computer equipment and storage medium
CN110942456B (en) Tamper image detection method, device, equipment and storage medium
Barodi et al. An enhanced artificial intelligence-based approach applied to vehicular traffic signs detection and road safety enhancement
CN114724258A (en) Living body detection method, living body detection device, storage medium and computer equipment
CN111582057B (en) Face verification method based on local receptive field
CN111666813B (en) Subcutaneous sweat gland extraction method of three-dimensional convolutional neural network based on non-local information
CN116935494B (en) Multi-person sitting posture identification method based on lightweight network model
CN109359543B (en) Portrait retrieval method and device based on skeletonization
CN116758419A (en) Multi-scale target detection method, device and equipment for remote sensing image
CN113221824B (en) Human body posture recognition method based on individual model generation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant