CN114332951A

CN114332951A - Sitting posture detection method and system and electronic equipment

Info

Publication number: CN114332951A
Application number: CN202210009187.6A
Authority: CN
Inventors: 宣琦; 虞馨杭; 周洁韵; 俞山青; 翔云; 韦永昌
Original assignee: Hangzhou Fuyi Intelligent Technology Co ltd
Current assignee: Hangzhou Fuyi Intelligent Technology Co ltd
Priority date: 2022-01-06
Filing date: 2022-01-06
Publication date: 2022-04-12

Abstract

The invention provides a sitting posture detection method, a sitting posture detection system and electronic equipment, and relates to the technical field of pattern recognition, wherein the method comprises the steps of inputting acquired video data containing a person to be detected into a trained sitting posture detection model, outputting key points and connection relations of the person to be detected by using the sitting posture detection model, and determining joint points of the person to be detected according to the key points and the connection relations; and then determining the distance and angle relationship between the joint points according to the coordinates of the joint points, and determining the sitting posture of the person to be detected according to the distance and angle relationship between the joint points. According to the sitting posture detection method, the traditional model is compressed by carrying out filter pruning on the low-rank characteristic diagram through the sitting posture detection model, the model can be accelerated on the premise of ensuring the precision, the sitting posture of the person to be detected is rapidly determined according to the joint points of the person to be detected, which are identified and obtained through the sitting posture detection model, and the resource consumption in the detection process is reduced on the premise of ensuring the detection precision.

Description

Sitting posture detection method and system and electronic equipment

Technical Field

The invention relates to the technical field of pattern recognition, in particular to a sitting posture detection method and system and electronic equipment.

Background

The sitting posture detection is a common human posture detection and is mainly used in use scenes such as sitting posture correction and sitting posture assessment. The sitting posture detection mainly utilizes the technologies of a sensor, image recognition and the like to recognize the postures of the head and the shoulders of the human body when the human body sits. In the sitting posture assessment process by utilizing equipment such as an ultrasonic sensor or an infrared sensor, the sitting posture is mainly judged by utilizing the distance between the sensor and a person to be tested, and the method is rapid but low in precision; the posture of the person to be detected can be directly obtained by using a correlation recognition algorithm in the image recognition technology, for example, the data of the joint points in the image of the person to be detected can be directly obtained by using an openposition model, so that the sitting posture of the person to be detected can be obtained; however, although the conventional image recognition technology has high accuracy, the calculation cost is high, and high calculation resources need to be consumed.

In summary, a detection method with high precision and low resource consumption is still lacking in the sitting posture detection process in the prior art.

Disclosure of Invention

In view of the above, the present invention provides a sitting posture detection method, a sitting posture detection system and an electronic device, in which a sitting posture detection model used in the method performs filter pruning on a low-rank characteristic diagram to realize compression of a conventional model, so that acceleration of the model can be realized on the premise of ensuring accuracy, a sitting posture of a person to be detected can be quickly determined according to a joint point of the person to be detected, which is identified by the sitting posture detection model, and resource consumption in a detection process is reduced on the premise of ensuring detection accuracy.

In a first aspect, an embodiment of the present invention provides a sitting posture detecting method, including the following steps:

acquiring video data containing a person to be detected;

inputting the acquired video data into the trained sitting posture detection model; the sitting posture detection model is an OpenPose model subjected to pruning operation; the sitting posture detection model comprises a main trunk layer, an initialization layer and an extraction layer; the main trunk layer is a MoblieNet network subjected to pruning operation; carrying out weight sharing on branches with the same structure in an initialization layer of the sitting posture detection model; the convolution kernel used in the refinement layer comprises a plurality of 3x3 convolution kernels;

outputting key points and connection relations of the personnel to be detected by using the sitting posture detection model, and determining joint points of the personnel to be detected according to the key points and the connection relations;

and determining the distance and angle relationship between the joint points according to the coordinates of the joint points, and determining the sitting posture of the person to be detected according to the distance and angle relationship between the joint points.

In some embodiments, the step of obtaining video data comprising a person to be detected is preceded by the method comprising:

judging whether a person exists in an acquisition area of a person to be detected;

if yes, starting the video recording equipment to acquire video data; if not, stopping the operation of the video recording equipment.

In some embodiments, the determining the distance and the angular relationship between the joint points according to the coordinates of the joint points, and the determining the sitting posture of the person to be detected according to the distance and the angular relationship between the joint points includes:

determining the coordinates of the joint points according to a coordinate system constructed by the joint points; wherein, the joint point includes at least: left eye, right eye, left ear, right ear, nose, neck, left shoulder, right shoulder;

respectively acquiring a connecting line of two eyes of a left eye and a right eye and a connecting line of two ears of a left ear and a right ear in a joint point, and determining whether the sitting posture of a person to be detected is a head-tilted sitting posture or not according to the angles of the connecting lines of the two eyes and the connecting lines of the two ears;

acquiring the height difference between the nose and the neck in the key point, and determining whether the sitting posture of the person to be detected is a head-down sitting posture or not according to the height difference;

and acquiring two shoulder connecting lines between the left shoulder and the right shoulder in the joint point, and determining whether the sitting posture of the person to be detected is an inclined sitting posture or not according to the angle of the two shoulder connecting lines.

In some embodiments, when the included angle between the two-eye connecting line and the horizontal line is detected to exceed 15 degrees or the included angle between the two-ear connecting line and the horizontal line is detected to exceed 15 degrees, the sitting posture of the person to be detected is determined as a head-tilted sitting posture;

when the height of the nose is lower than that of the neck, determining the sitting posture of the person as a head-down sitting posture;

and when the included angle between the connecting line of the two shoulders and the horizontal line is detected to exceed 15 degrees, determining the sitting posture of the person to be detected as an inclined sitting posture.

In some embodiments, the sitting posture detection model acquisition process comprises:

acquiring an openfuse model which is initialized; wherein, the openposition model comprises a trunk layer, an initialization layer and a refinement layer;

replacing the trunk layer with a Mobilenet-V1 model structure subjected to pruning operation; pruning is carried out by utilizing the sequencing result of the average rank of the output characteristic diagram corresponding to the convolution kernel contained in the Mobilene-V1 model in the pruning operation; wherein the pruning rate is 0.3, and the number of channels after pruning is 358;

and replacing the 7 × 7 convolution kernel in the refinement layer with the convolution cascade of (1 × 1,3 × 3,3 × 3) to obtain the sitting posture detection model.

In some embodiments, outputting the key points and the connection relations of the person to be detected by using the sitting posture detection model comprises:

acquiring an image frame containing video data of a person to be detected, and determining the image frame compressed to 416 multiplied by 3 as an input image;

sequentially carrying out depth separable convolution operation on the input image by using a plurality of convolution kernels contained in the main layer to obtain a feature mapping result; the sizes of the convolution kernels are all 3 × 3, and the number of channels of the convolution kernels at least comprises: 32. 44, 89, 179, 358, 512; the convolution kernels with the channel numbers of 32, 44 and 512 at least comprise 1 convolution kernel; the convolution kernels with the channel numbers of 89 and 179 at least comprise 2; the convolution kernel with 358 channels at least comprises 5;

inputting the feature mapping result into an initialization layer, and correspondingly outputting a key point heat map and a connection relation heat map through two branches contained in the initialization layer;

sequentially inputting the key point heat map and the connection relation heat map into a plurality of refinement layers; and respectively obtaining key points and connection relations of the personnel to be detected.

In some embodiments, after the step of obtaining video data comprising a person to be detected, the method further comprises:

extracting a video clip of the ROI area containing the person to be detected in the video data; wherein the ROI area comprises at least: a head region, a torso region, a limb region, a palm region;

the step of inputting the acquired video data into the trained sitting posture detection model comprises the following steps:

and respectively inputting the video clips containing the ROI of the person to be detected into the sitting posture detection model.

In a second aspect, an embodiment of the present invention provides a sitting posture detecting system, including the following modules:

the data acquisition module is used for acquiring video data containing a person to be detected;

the data input module is used for inputting the acquired video data into the trained sitting posture detection model; the sitting posture detection model is an OpenPose model subjected to pruning operation; the sitting posture detection model comprises a main trunk layer, an initialization layer and an extraction layer; the main trunk layer is a MoblieNet network subjected to pruning operation; carrying out weight sharing on branches with the same structure in an initialization layer of the sitting posture detection model; the convolution kernel used in the refinement layer of (1) comprises a plurality of 3x3 convolution kernels;

the joint point detection module is used for outputting key points and connection relations of the person to be detected by using the sitting posture detection model and determining joint points of the person to be detected according to the key points and the connection relations;

and the sitting posture determining module is used for determining the distance and the angle relation between the joint points according to the coordinates of the joint points and determining the sitting posture of the person to be detected according to the distance and the angle relation between the joint points.

In a third aspect, an embodiment of the present invention provides an electronic device, including: a processor and a storage device; the storage means has stored thereon a computer program which, when executed by the processor, performs the steps of the sitting posture detection method as provided by the first aspect.

In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements the steps of the sitting posture detecting method provided in the first aspect.

The embodiment of the invention has the following beneficial effects: the embodiment of the invention provides a sitting posture detection method, a sitting posture detection system and electronic equipment, wherein the method comprises the steps of inputting acquired video data containing a person to be detected into a trained sitting posture detection model, outputting key points and connection relations of the person to be detected by using the sitting posture detection model, and determining joint points of the person to be detected according to the key points and the connection relations; then determining the distance and angle relation between the joint points according to the coordinates of the joint points, and determining the sitting posture of the person to be detected according to the distance and angle relation between the joint points; the sitting posture detection model is an OpenPose model subjected to pruning operation; the sitting posture detection model comprises a main trunk layer, an initialization layer and an extraction layer; the main trunk layer is a MoblieNet network subjected to pruning operation; carrying out weight sharing on branches with the same structure in an initialization layer of the sitting posture detection model; the convolution kernel used in the refinement layer includes a plurality of 3x3 convolution kernels. According to the sitting posture detection method, the traditional model is compressed by carrying out filter pruning on the low-rank characteristic diagram through the sitting posture detection model, the model can be accelerated on the premise of ensuring the precision, the sitting posture of the person to be detected is rapidly determined according to the joint points of the person to be detected, which are identified and obtained through the sitting posture detection model, and the resource consumption in the detection process is reduced on the premise of ensuring the detection precision.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1 is a flowchart of a sitting posture detecting method according to an embodiment of the present invention;

fig. 2 is a flowchart of step S101 in a sitting posture detecting method according to an embodiment of the present invention;

fig. 3 is a flowchart of step S102 in a sitting posture detecting method according to an embodiment of the present invention;

fig. 4 is a flowchart of step S103 in a sitting posture detecting method according to an embodiment of the present invention;

fig. 5 is a structural diagram of a sitting posture detecting model used in the sitting posture detecting method according to the embodiment of the present invention;

fig. 6 is a flowchart of step S104 in a sitting posture detecting method according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of a sitting posture detecting system according to an embodiment of the present invention;

fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Icon:

710-a data acquisition module; 720-data input module; 730-joint detection module; 740-a sitting posture determination module;

101-a processor; 102-a memory; 103-a bus; 104-communication interface.

Detailed Description

To make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The sitting posture detection is a common human posture detection and is mainly used in use scenes such as sitting posture correction and sitting posture assessment. The sitting posture detection mainly utilizes the technologies of a sensor, image recognition and the like to recognize the postures of the head and the shoulders of the human body when the human body sits. In the sitting posture assessment process by utilizing equipment such as an ultrasonic sensor or an infrared sensor, the sitting posture is mainly judged by utilizing the distance between the sensor and a person to be tested, and the method is rapid but low in precision; the posture of the person to be detected can be directly obtained by using a correlation recognition algorithm in the image recognition technology, for example, the data of the joint points in the image of the person to be detected can be directly obtained by using an openposition model, so that the sitting posture of the person to be detected can be obtained; however, although the conventional image recognition technology has high accuracy, the calculation cost is high, and high calculation resources need to be consumed. Therefore, a detection method with high precision and low resource consumption is still lacked in the sitting posture detection process in the prior art.

In order to solve the above problems, embodiments of the present invention provide a sitting posture detection method, a sitting posture detection system, and an electronic device, in which a sitting posture detection model in the method is an openpos model that is operated by pruning, so that compression of a conventional openpos model is realized, acceleration of the model can be realized on the premise of ensuring accuracy, a sitting posture of a person to be detected can be quickly determined according to a joint point of the person to be detected, which is identified by the sitting posture detection model, and resource consumption in a detection process is reduced on the premise of ensuring detection accuracy.

To facilitate understanding of the present embodiment, a sitting posture detecting method disclosed in the present embodiment will be described in detail first. Specifically, the flowchart of the method is shown in fig. 1, and includes the following steps:

and S101, acquiring video data containing a person to be detected.

The video data acquisition process can be realized by using an electronic device or a camera with a camera, specifically, the video data can be directly acquired by using a smart phone, a tablet computer, a camera with a USB (Universal Serial Bus) interface, and a camera with a Lan (Local Area Network) interface, and can also be directly acquired by using a related industrial camera. Video data is a category in the field of digital images, and mainly comprises digital image related data such as video streams and video frames.

It should be noted that the video data in this step includes the person to be detected, and the video data can be obtained by determining whether the person to be detected exists in the actual scene. Specifically, in some embodiments, before the step of acquiring the video data including the person to be detected, as shown in fig. 2, the method further includes:

step S21, it is determined whether or not a person is present in the acquisition area of the person to be detected.

The acquisition area of the person to be detected can be understood as the area contained under the view angle of the camera acquiring the video data. Judging whether a person exists in the area can be realized by combining a corresponding movement detection algorithm or a person detection algorithm; it can also be implemented in combination with the ultrasonic or infrared sensors of the prior art. For example, the camera is provided on a table for detecting the sitting posture of a person sitting at the table. At this time, corresponding sensors can be arranged in the seat area of the table, and whether a person exists can be judged through the sensors. Similarly, the camera can also be directly used for carrying out a correlation detection algorithm, so as to obtain whether people exist in the area.

Step S22, if yes, starting a video recording device to obtain video data; if not, stopping the operation of the video recording equipment.

When personnel exist, the video recording equipment is started to work to acquire video data; if the video recording device does not exist, the video recording device stops working, and the video recording device can be turned off, so that energy is saved.

According to the steps, the video data certainly contain the person to be detected, and therefore the sitting posture of the person to be detected in the video data is detected.

Step S102, inputting the acquired video data into the trained sitting posture detection model; the sitting posture detection model is an OpenPose model subjected to pruning operation; the sitting posture detection model comprises a main trunk layer, an initialization layer and an extraction layer; the main trunk layer is a MoblieNet network subjected to pruning operation; carrying out weight sharing on branches with the same structure in an initialization layer of the sitting posture detection model; the convolution kernel used in the refinement layer includes a plurality of 3x3 convolution kernels.

Unlike the traditional openpos model, the sitting posture detection model is subjected to a corresponding pruning operation. Specifically, in some embodiments, the process of acquiring the sitting posture detection model, as shown in fig. 3, includes:

step S31, acquiring the OpenPose model with initialization completed; the openpos model includes a trunk layer, an initialization layer, and a refinement layer.

Specifically, the trunk layer is a Backbone, and the trunk layer includes a plurality of convolution kernels for performing feature extraction on an input video stream or an image frame. Connecting an initialization layer initial stage behind the main layer, wherein the initialization layer comprises two branches which respectively generate a key point heat map and a connection relation heat map correspondingly; connected behind the initialization layer is a refinement layer, refinishment stage, whose role is to further generate more accurate heatmaps.

Step S32, replacing the main trunk layer with a Mobilenet-V1 model structure which is subjected to pruning operation; pruning is carried out by utilizing the sequencing result of the average rank of the output characteristic diagram corresponding to the convolution kernel contained in the Mobilene-V1 model in the pruning operation; wherein the pruning rate is 0.3, and the number of channels after pruning is 358.

The mobilene-V1 is pruned as a new trunk layer Backbone, specifically the pruning operation is performed for the filter. The purpose of filter pruning is to subtract unimportant convolution kernels, whose importance is related to the rank of the output feature map. Specifically, it is known from singular value decomposition that the larger the rank is, the larger the amount of information contained therein is, and therefore the less the rank is, the lower the significance of the feature map is.

In the process of pruning the filter, the average rank of the output characteristic graph of each convolution kernel can be calculated, and the rank is sorted. And then, the number of pruning is determined according to the sorting result, the sorted convolution kernels are pruned, then the trunk layer is adjusted by taking the residual filter as an initial parameter, and finally a group of feature maps are generated and used as the input of the initialization layer.

And step S33, replacing the 7 × 7 convolution kernel in the refinement layer with the convolution cascade of (1 × 1,3 × 3,3 × 3) to obtain the sitting posture detection model.

The structure diagram of the sitting posture detection model is shown in fig. 5, specifically, the sitting posture detection model prunes the original openpos network model, and only the initialization layer and the first refinement layer are reserved for compressing the model as much as possible. It is worth mentioning that the relevant weights in the initialization layer are shared, thereby reducing the amount of computation; and finally, separating two branches of the two convolutional layers to respectively predict the heat maps of the key points and the heat maps of the corresponding connection relations.

And S103, outputting key points and connection relations of the person to be detected by using the sitting posture detection model, and determining the joint points of the person to be detected according to the key points and the connection relations.

This step is described below with the above model results in conjunction with a specific example, as shown in fig. 4, and includes:

in step S41, an image frame containing video data of a person to be detected is acquired, and the image frame compressed to 416 × 416 × 3 is determined as an input image.

The required input image size of the main layer of the model is 416 × 416 × 3, and if the acquired image frame size does not meet the size, the image frame is subjected to size compression to meet the format requirement.

Step S42, sequentially performing depth separable convolution operation on the input image by using a plurality of convolution kernels contained in the main layer to obtain a feature mapping result; the sizes of the convolution kernels are all 3 × 3, and the number of channels of the convolution kernels at least comprises: 32. 44, 89, 179, 358, 512; the convolution kernels with the channel numbers of 32, 44 and 512 at least comprise 1 convolution kernel; the convolution kernels with the channel numbers of 89 and 179 at least comprise 2; the number of channels 358 of the convolution kernel includes at least 5.

The trunk layer of the sitting posture detection model comprises a plurality of convolution kernels, and as shown in fig. 5, 12 convolution kernels are provided, and except for the first convolution kernel and the last convolution kernel, other convolution kernels are subjected to pruning operation, and the number of channels is reduced. Specifically, the number of channels of the second convolution kernel is reduced from 64 to 44 after pruning; the channel number of the third convolution kernel and the fourth convolution kernel is reduced from 128 to 89 after pruning; the number of channels of the fifth convolution kernel and the sixth convolution kernel is reduced from 256 to 179 after pruning; the number of channels from the seventh convolution kernel to the eleventh convolution kernel is reduced from 512 to 358 after pruning. Therefore, the structure of the sitting posture detection model is more refined than that of the existing OpenPose model, the number of involved parameters is less, and the detection precision and speed are higher.

Performing depth separable convolution operation according to the convolution kernels to finally obtain a feature mapping result; specifically, the feature mapping result is a 52 × 52 × 512 feature map.

Step S43, inputting the feature mapping result into the initialization layer, and outputting the key point heat map and the connection relationship heat map respectively corresponding to the two branches included in the initialization layer.

After the feature mapping result is input to the initialization layer, it is finally divided into two branches by one 1 × 1 convolution kernel and multiple 3 × 3 convolution kernels in fig. 5, where one branch is used to obtain the key point heat map and one branch is used to obtain the connection relationship heat map. Specifically, the key point heat map is a map containing only the joint points of the person to be detected; the connection relation refers to the human body relation formed by connecting the joint points, such as four limbs, the trunk and the like. It should be noted that the convolution kernel in the initialization layer is before branching, and the weight value therein is a shared operation, thereby reducing the amount of calculation during the operation.

Step S44, sequentially inputting the key point heat map and the connection relation heat map into a plurality of refinement layers; and respectively obtaining key points and connection relations of the personnel to be detected.

According to the illustration in fig. 5, the key point heat map and the connection relationship heat map sequentially pass through five refinement layers, then pass through two convolution kernels, and finally obtain the key points and the connection relationships of the person to be detected through corresponding filling operations and the like.

After the key points and the connection relation are determined, the joint points of the person to be detected can be obtained; in general, the joint points involved in the posture detection process at least include: left eye, right eye, left ear, right ear, nose, neck, left shoulder, right shoulder.

And step S104, determining the distance and the angle relation between the joint points according to the coordinates of the joint points, and determining the sitting posture of the person to be detected according to the distance and the angle relation between the joint points.

Specifically, the steps are as shown in fig. 6, and include:

step S61, determining the coordinates of the joint points according to the coordinate system constructed by the joint points; wherein, the joint point includes at least: left eye, right eye, left ear, right ear, nose, neck, left shoulder, right shoulder.

The coordinate system can be constructed by using one of the joint points as an origin and according to the corresponding perspective relationship, such as: constructing a three-dimensional coordinate axis by taking a nose as an origin; a fixed reference contained in the video data acquisition process may also be used as the origin. After the coordinate system is determined, the coordinate data of the related joint point can be determined.

And step S62, respectively obtaining a connecting line of two eyes of the left eye and the right eye and a connecting line of two ears of the left ear and the right ear in the joint point, and determining whether the sitting posture of the person to be detected is a head-tilted sitting posture or not according to the angles of the connecting lines of the two eyes and the connecting lines of the two ears.

Specifically, the two-eye connecting line of the left eye and the right eye and the two-ear connecting line of the left ear and the right ear represent whether the person to be detected is askew. Generally speaking, the two-eye connecting line and the two-ear connecting line are parallel to the horizontal plane in a standard sitting posture, but the two-eye connecting line and the two-ear connecting line are not strictly controlled to be parallel to the horizontal plane in a standard sitting posture in an actual scene, and a corresponding angle threshold value can be set to judge whether a person to be detected is askew. For example, when the included angle between the two-eye connecting line and the horizontal line exceeds 15 degrees or the included angle between the two-ear connecting line and the horizontal line exceeds 15 degrees, the sitting posture of the person to be detected can be determined as the head-tilted sitting posture.

And step S63, acquiring the height difference between the nose and the neck in the key point, and determining whether the sitting posture of the person to be detected is a low-head sitting posture or not according to the height difference.

The low head sitting posture is a sitting posture with the head too low, which is usually the nose or the neck below, so that when the height of the nose is detected to be lower than the height of the neck, the sitting posture of the person is determined as the low head sitting posture; a corresponding height difference threshold value can be set in an actual scene, for example, when the height of the nose is lower than the height of the neck and exceeds five centimeters, the head-down posture can be determined.

And step S64, acquiring a two-shoulder connecting line between the left shoulder and the right shoulder in the joint point, and determining whether the sitting posture of the person to be detected is an inclined sitting posture or not according to the angle of the two-shoulder connecting line.

The two shoulder connecting lines between the left shoulder and the right shoulder represent whether the body of the person to be detected is inclined, and generally speaking, the two shoulder connecting lines are parallel to the horizontal plane in a standard sitting posture. However, similar to the two-eye connecting line and the two-ear connecting line, a corresponding angle threshold value can be set in an actual scene to judge whether the person to be detected inclines or not. For example, when the included angle between the two shoulder connecting lines and the horizontal line is detected to exceed 15 degrees, the sitting posture of the person to be detected is determined as the inclined sitting posture.

Performing filter pruning on MoblieNet v1 based on a high-rank characteristic diagram, wherein the filter pruning is used as a backbone of a sitting posture detection model, and the pruning rate is as follows: [0.0]+[0.3]*10+[0.0]. The purpose of filter pruning is to subtract unimportant convolution kernels, which we define as a function of the rank of the output profile. The principle behind: the low rank feature maps contain less information and the pruning results are easily reproducible, while the weights of the high rank feature maps contain more important information with little impairment of the model performance even if part of the information is not updated.

In particular, the pruning operation can be translated into a minimization problem:

wherein, assume that there is a trained CNN model with K convolutional layers, n_iRepresenting the number of filters in the i-th convolutional layer;

represents the jth filter of the ith convolutional layer; delta_ijIs an index when

Is 1 when it is a low rank value

0 when it is a high rank value; by

Generating a feature map for layer j

n_i2Representing the number of low-rank filters, n_i1Representing the number of high-rank filters, i.e. n_i＝n_i1+n_i2。

Wherein the content of the first and second substances,

rank is the Rank of the feature map of the input image I, and the function is a function reflecting the richness of the feature map information.

Performing singular value decomposition on the input image I to obtain: the larger the rank is, the larger the amount of information contained, and therefore the smaller the rank is, the lower the significance of the feature map is. The specific decomposition process is as follows:

wherein σ_iIs the maximum singular value, u_iIs the left singular value, v_iIs the right singular value; a rank r feature map may be decomposed into a lower order feature map of rank r'. It is worth mentioning that r is

Thus, a high rank eigenmap actually contains more information than a low order eigenmap. Therefore, the rank can be used as a reliable index for measuring the richness of the information.

A small set of input images can be used to accurately estimate the expectation of the rank of the feature map, which is defined in particular by the following equation

Expectation of rank:

where g denotes the size of the input image. Combining the above equations to obtain the following equations:

the above formula can be obtained by using the minimum average rank of the feature mapping and the number n of low rank filters_i2Pruning is performed to achieve minimization. Due to feature mappingThe average rank of the shots is independent of the batch size, and a smaller input can be used to get the average rank. Because the purpose of filter pruning is to subtract unimportant convolution kernels, the importance of the convolution kernels is defined to be related to the rank of the output characteristic diagram, so that the low-rank characteristic mapping contains less information, and the final pruning result is easy to reproduce; and the weight of the high-rank feature mapping contains more important information, and even if part of the information is not updated, the damage to the model performance is small.

In the process, the video data containing the person to be detected is directly input into the sitting posture detection model for identification, the content contained in the video data in the actual scene is not completely available, and a lot of video data are useless for sitting posture detection. Therefore, the area interception can be carried out on the video data, and the occupation ratio of the personnel to be detected in the video data is improved as much as possible. Thus, in some embodiments, after the step of obtaining video data comprising a person to be detected, the method further comprises: extracting a video clip of the ROI area containing the person to be detected in the video data; wherein the ROI area comprises at least: head region, torso region, limb region, palm region. On the basis, the step of inputting the acquired video data into the trained sitting posture detection model comprises the following steps: and respectively inputting the video clips containing the ROI of the person to be detected into the sitting posture detection model.

In the process, the ROI in the video data is extracted, so that the interference of redundant data to the detection process can be reduced, the data calculation amount is reduced, and the detection speed is favorably improved.

According to the sitting posture detection method in the embodiment, the traditional OpenPose model is compressed, and related parameter settings are reduced, on one hand, backbone is replaced by the pruned Mobilene-v 1, and the number of channels is reduced from 512 to 358; on the other hand, an initial stage network structure is optimized, weight sharing is carried out on the parts with the same structures of the two branches, operation calculation amount is reduced, and the two branches are separated until the last convolution layer is output and are respectively used for predicting heatmap (heat map) and pafs (connection relation); the original 7x7 volume Block is replaced by a Refinement Stage Block (1x1,3x3,3x3), so that the model weight is reduced by 54%, the performance is almost the same, and the precision is reduced by only 2.3%. (see FIG. 1 for details of channel number variation), and lifting the receptive field optimization network by using hole convolution. According to the invention, the original OpenPose code is changed according to the demand of sitting posture estimation, the extra memory allocation is deleted, and finally the head posture and the shoulder posture in the sitting posture state can be recognized: low head, head skew (including left/right), standard head, inclined body (including left/right), correct body.

The sitting posture detection model used in the method realizes the compression of the traditional model through pruning operation, can realize the acceleration of the model on the premise of ensuring the precision, and quickly determines the sitting posture of the person to be detected according to the joint points of the person to be detected, which are identified and obtained by the sitting posture detection model, thereby reducing the resource consumption in the detection process on the premise of ensuring the detection precision.

The present embodiment further provides a sitting posture detecting system, as shown in fig. 7, the system includes the following modules:

the data acquisition module 710 is used for acquiring video data containing a person to be detected;

a data input module 720, configured to input the acquired video data into the trained sitting posture detection model; the sitting posture detection model is an OpenPose model subjected to pruning operation; the sitting posture detection model comprises a main trunk layer, an initialization layer and an extraction layer; the main trunk layer is a MoblieNet network subjected to pruning operation; carrying out weight sharing on branches with the same structure in an initialization layer of the sitting posture detection model; the convolution kernel used in the refinement layer of (1) comprises a plurality of 3x3 convolution kernels;

the joint point detection module 730 is used for outputting key points and connection relations of the person to be detected by using the sitting posture detection model and determining joint points of the person to be detected according to the key points and the connection relations;

the sitting posture determining module 740 is configured to determine the distance and the angle relationship between the joint points according to the coordinates of the joint points, and determine the sitting posture of the person to be detected according to the distance and the angle relationship between the joint points.

The implementation principle and the technical effects of the sitting posture detecting system provided by the embodiment of the invention are the same as those of the embodiment of the sitting posture detecting method, and for the sake of brief description, no part of the embodiment is mentioned, and reference may be made to the corresponding contents in the embodiment.

The embodiment also provides an electronic device, a schematic structural diagram of which is shown in fig. 8, and the electronic device includes a processor 101 and a memory 102; the memory 102 is used for storing one or more computer instructions, and the one or more computer instructions are executed by the processor to implement the sitting posture detection method.

The server shown in fig. 8 further includes a bus 103 and a communication interface 104, and the processor 101, the communication interface 104, and the memory 102 are connected through the bus 103.

The Memory 102 may include a high-speed Random Access Memory (RAM) and may also include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. Bus 103 may be an ISA bus, PCI bus, EISA bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one double-headed arrow is shown in FIG. 8, but that does not indicate only one bus or one type of bus.

The communication interface 104 is configured to connect with at least one user terminal and other network units through a network interface, and send the packaged IPv4 message or IPv4 message to the user terminal through the network interface.

The processor 101 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 101. The Processor 101 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the device can also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component. The various methods, steps, and logic blocks disclosed in the embodiments of the present disclosure may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present disclosure may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 102, and the processor 101 reads the information in the memory 102 and completes the steps of the method of the foregoing embodiment in combination with the hardware thereof.

Embodiments of the present invention further provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, performs the steps of the method of the foregoing embodiments.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus, and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, a division of a unit is merely a division of one logic function, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.

Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art will understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being included therein. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A sitting posture detecting method, comprising the steps of:

acquiring video data containing a person to be detected;

inputting the acquired video data into a sitting posture detection model which is trained; the sitting posture detection model is an OpenPose model subjected to pruning operation; the sitting posture detection model comprises a main trunk layer, an initialization layer and an extraction layer; the main trunk layer is a MoblieNet network subjected to pruning operation; carrying out weight sharing on branches with the same structure in an initialization layer of the sitting posture detection model; the convolution kernel used in the refinement layer comprises a plurality of 3x3 convolution kernels;

2. The sitting posture detecting method as claimed in claim 1, wherein the step of obtaining video data containing a person to be detected is preceded by the method comprising:

if yes, starting a video recording device to acquire the video data; if not, stopping the operation of the video recording equipment.

3. The sitting posture detection method as claimed in claim 1, wherein the step of determining the distance and the angular relationship between the joint points according to the coordinates of the joint points and determining the sitting posture of the person to be detected according to the distance and the angular relationship between the joint points comprises:

determining the coordinates of the joint points according to the coordinate system constructed by the joint points; wherein the articulation point comprises at least: left eye, right eye, left ear, right ear, nose, neck, left shoulder, right shoulder;

respectively acquiring two eye connecting lines of the left eye and the right eye and two ear connecting lines of the left ear and the right ear in the joint point, and determining whether the sitting posture of the person to be detected is a head-tilted sitting posture or not according to the angles of the two eye connecting lines and the two ear connecting lines;

and acquiring a two-shoulder connecting line between the left shoulder and the right shoulder in the joint point, and determining whether the sitting posture of the person to be detected is an inclined sitting posture or not according to the angle of the two-shoulder connecting line.

4. The sitting posture detection method as claimed in claim 3, wherein when the included angle between the two eye connecting lines and the horizontal line is detected to exceed 15 degrees or the included angle between the two ear connecting lines and the horizontal line is detected to exceed 15 degrees, the sitting posture of the person to be detected is determined as a head-tilted sitting posture;

when the height of the nose is detected to be lower than that of the neck, determining the sitting posture of the person as a head-down sitting posture;

and when the included angle between the two shoulder connecting lines and the horizontal line is detected to exceed 15 degrees, determining the sitting posture of the person to be detected as an inclined sitting posture.

5. The sitting posture detecting method as claimed in claim 1, wherein the sitting posture detecting model obtaining process comprises:

acquiring an openfuse model which is initialized; the openposition model comprises a trunk layer, an initialization layer and a refinement layer;

replacing the trunk layer with a Mobilenet-V1 model structure subjected to filter pruning; the filter pruning operation carries out pruning operation by utilizing the sequencing result of the average rank of the output characteristic diagram corresponding to the convolution kernel contained in the Mobilene-V1 model; wherein the pruning rate is 0.3, and the number of channels after pruning is 358;

and replacing the 7 × 7 convolution kernel in the refinement layer with (1 × 1,3 × 3,3 × 3) convolution cascade to obtain the sitting posture detection model.

6. The sitting posture detecting method as claimed in claim 5, wherein the pruning operation is performed by using the following equation:

wherein K is the number of the convolution layers, n_iRepresenting the number of filters in the i-th convolutional layer;

Is 1 when it is a low rank value

0 when it is a high rank value; by

Generating a feature map for layer j

7. The sitting posture detection method as claimed in claim 6, wherein outputting key points and connection relations of the person to be detected by using the sitting posture detection model comprises:

acquiring an image frame containing video data of a person to be detected, and determining the image frame compressed to 416 × 416 × 3 as an input image;

sequentially performing depth separable convolution operation on the input image by using a plurality of convolution cores contained in the main layer to obtain a feature mapping result; the sizes of the convolution kernels are all 3 × 3, and the number of channels of the convolution kernels at least includes: 32. 44, 89, 179, 358, 512; the convolution kernels with the channel numbers of 32, 44 and 512 at least comprise 1 convolution kernel; the convolution kernels with the channel numbers of 89 and 179 at least comprise 2; the convolution kernel with the channel number of 358 at least comprises 5;

sequentially inputting the keypoint heat map and the connection relationship heat map into a plurality of refinement layers; and respectively obtaining the key points and the connection relation of the personnel to be detected.

8. The sitting posture detecting method as claimed in claim 1, wherein after the step of obtaining the video data containing the person to be detected, the method further comprises:

extracting a video clip containing the ROI of the person to be detected in the video data; wherein the ROI area comprises at least: a head region, a torso region, a limb region, a palm region;

9. A sitting posture detection system, characterized in that the system comprises the following modules:

the data input module is used for inputting the acquired video data into the trained sitting posture detection model; the sitting posture detection model is an OpenPose model subjected to pruning operation; the sitting posture detection model comprises a main trunk layer, an initialization layer and an extraction layer; the main trunk layer is a MoblieNet network subjected to pruning operation; carrying out weight sharing on branches with the same structure in an initialization layer of the sitting posture detection model; the convolution kernel used in the refinement layer comprises a plurality of 3x3 convolution kernels;

the joint point detection module is used for outputting key points and connection relations of the personnel to be detected by using the sitting posture detection model and determining joint points of the personnel to be detected according to the key points and the connection relations;

10. An electronic device, comprising: a processor and a storage device; the storage device has stored thereon a computer program which, when executed by the processor, implements the steps of the sitting posture detection method as claimed in any one of claims 1 to 8.