CN111325166A - Sitting posture identification method based on projection reconstruction and multi-input multi-output neural network - Google Patents

Sitting posture identification method based on projection reconstruction and multi-input multi-output neural network Download PDF

Info

Publication number
CN111325166A
CN111325166A CN202010119569.5A CN202010119569A CN111325166A CN 111325166 A CN111325166 A CN 111325166A CN 202010119569 A CN202010119569 A CN 202010119569A CN 111325166 A CN111325166 A CN 111325166A
Authority
CN
China
Prior art keywords
sitting posture
human body
depth
input
map
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010119569.5A
Other languages
Chinese (zh)
Other versions
CN111325166B (en
Inventor
沈捷
黄安义
王莉
曹磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Tech University
Original Assignee
Nanjing Tech University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Tech University filed Critical Nanjing Tech University
Priority to CN202010119569.5A priority Critical patent/CN111325166B/en
Publication of CN111325166A publication Critical patent/CN111325166A/en
Application granted granted Critical
Publication of CN111325166B publication Critical patent/CN111325166B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing

Abstract

The invention relates to a sitting posture identification method based on projection reconstruction and a multiple-input multiple-output neural network (MIMO-CNN), which comprises the following steps: acquiring a depth image of the upper half of a human body and a foreground contour map of the human body; pre-treating; projecting the depth information of the sitting posture contour, and reconstructing to obtain a three-view depth map; designing an MIMO-CNN network for sitting posture identification and learning model parameters; and (4) self-learning of the model. The advantages are that: and combining the preprocessed depth image with the human body contour map to eliminate the interference of surrounding background on the sitting posture recognition. The three-view depth map is obtained by using a projection reconstruction method, so that the sitting posture information is richer. The designed MIMO-CNN structure is particularly suitable for projection reconstruction characteristic information, integrates an attention mechanism, can better focus on hot spot areas with different sitting postures, improves the identification precision, adopts model self-learning, well balances the requirements on real-time performance and accuracy, and has stronger interference resistance to view angle change and complex environment background.

Description

Sitting posture identification method based on projection reconstruction and multi-input multi-output neural network
Technical Field
The invention discloses a sitting posture identification method based on projection information and a multiple-input multiple-output neural network (MIMO-CNN), and belongs to the technical field of human body posture identification.
Background
With the rapid development of science and technology, the sitting posture has become one of the most common daily states of modern people at present, and is closely related to the human body. Most writing offices and people before computers are all carried out in sitting postures, particularly sitting postures of teenagers and children in the learning process are not standard, but few people can notice the influence of the sitting postures on the body health, and a great number of people have a great number of bad habits when sitting and using the computers and like lowering heads, hunchback, sit up, inclined sitting and the like. Therefore, the sitting posture of the person can be automatically recognized by the method, and the sitting posture correction and guidance based on the sitting posture recognition method has high practical value.
The key to correcting and guiding the sitting posture is to accurately and quickly recognize the sitting posture, and research on the sitting posture recognition in China and abroad has been started from a very early time. Based on human experience in human posture recognition and motion detection, there are several methods for posture recognition. Some piezoelectric sensors are placed on the seat, and the sitting posture of a person on the seat is identified by analyzing data collected by the sensors, so that the method is easily influenced by external unstable factors such as environmental noise, offset and crosstalk, the pressure distribution is not only related to the sitting posture, but also the weight of the person and the sitting area on the seat greatly influence the irregular change of the data of the sensors, the generated data is difficult to understand, and the precision is greatly reduced. In recent years, with the development of computer vision, sitting posture recognition methods based on vision and image processing technologies are diversified, some researchers recognize sitting postures by using the size and position relation of the area occupied by the face in a video, some researchers also acquire human skeleton information by using a depth sensor and recognize the sitting postures according to the angle of a skeleton node, and other researchers acquire a large amount of human sitting posture information and train a sitting posture model by using machine learning to further recognize the human sitting postures. In summary, the current method has three main defects, and firstly, the interference resistance is poor under a complex background. Second, the traditional vision-based sitting posture recognition method is extremely sensitive to angle changes of the camera. Thirdly, the adaptability to the diversity of the sitting postures of the human body is poor, the existing method recognizes the sitting postures under a relatively ideal action, but the natural sitting postures of the people on the real seats are various and complex, and the adaptability and the robustness of the recognition method are still hot spots of the existing research method.
Disclosure of Invention
The invention provides a sitting posture identification method based on projection reconstruction and a multiple-input multiple-output neural network (MIMO-CNN), which aims to overcome the defects of poor anti-interference performance, high modeling difficulty, excessive sensitivity to angle change of a camera and the like of the existing sitting posture identification, and the method comprises the steps of respectively obtaining deep sitting posture images of a human body through a depth camera, preprocessing the deep sitting posture images and reconstructing three-dimensional information to obtain multi-view depth information, and finally utilizing the MIMO-CNN network to identify sitting posture states in the front and back directions and sitting posture states in the left and right directions of the sitting posture, so that a user can feed back misjudgment samples in the using process and regularly relearn and optimize network parameters, thereby improving the identification precision of the network.
The technical solution of the invention is as follows: a sitting posture identification method based on projection reconstruction and a multiple-input multiple-output neural network (MIMO-CNN) comprises the following steps:
(1) image acquisition: acquiring a depth image and a human body foreground contour map by using a depth camera;
(2) image preprocessing: preprocessing operations such as histogram equalization and filtering are carried out on the obtained depth image and the human body foreground contour map, and data enhancement is carried out on the sample so as to expand a data set for training;
(3) and (3) depth image projection reconstruction: performing projection reconstruction on the human body foreground contour depth map, and sequentially obtaining a left view, a top view and a main view, namely a three-view depth map, by taking the opposite direction of an X, Y, Z axis as a projection direction as shown in fig. 3;
(4) establishing a sitting posture identification model: designing an MIMO-CNN for sitting posture identification, and respectively taking the three-view depth maps processed in the step (3) as the input of three channels of the MIMO (multiple input multiple output) -CNN for network training;
(5) and (3) sitting posture identification: inputting the three-view depth map obtained by preprocessing into the MIMO-CNN as an input quantity, and finally identifying a sitting posture according to the distribution condition of a human body in the space;
(6) self-learning of the model: and self-screening the fed-back error samples, collecting the screened misjudgment samples, and automatically re-learning the model to improve the identification precision of the model.
The step (1) of obtaining the image comprises the following specific steps:
1) acquiring a depth image by using a depth camera;
2) and acquiring the human body foreground contour by using a built-in algorithm of a depth camera.
The image preprocessing of the step (2) comprises the following specific steps:
1) carrying out histogram equalization on the native depth image;
2) performing opening operation on the depth image, and removing outer edge burrs and missing blocks inside the outline;
3) denoising the human body foreground contour by adopting a median filtering method;
4) the human body foreground contour is taken as a template, and human body depth information is captured from the depth image to obtain a human body foreground depth map;
5) cutting off redundant white background of the human body foreground depth map through self-adaptive cutting;
7) the bilinear interpolation size is normalized to 224 x 224.
8) And performing data enhancement on the cut human body foreground depth map to expand a training sample.
The depth image projection reconstruction in the step (3) comprises the following specific steps:
1) for the human body foreground depth map in the step 2, the upper left corner is the origin of coordinates, the right side is the positive direction of an X axis, the downward side is the positive direction of a Y axis, and the pixel value is the direction of a Z axis, so that pixel points of one depth image can be seen as three-dimensional points. And (3) converting the original Z axis into the Y axis, converting the Y axis into the Z axis, and normalizing the data from 0-224 to 0-255 to obtain the top-view projection image of the human body foreground depth image.
2) And (3) converting the original Z axis of the human body foreground depth map in the step (2) into an X axis, converting the X axis into the Z axis, and normalizing the data from 0-224 to 0-255 to obtain a left-view projection map of the human body foreground depth map.
3) And (3) carrying out bilinear interpolation on the left view and the top view of the human body foreground depth map to normalize the sizes into 224 x 224, and collectively referring to the human body foreground depth map processed in the step (2) as a three-view depth map.
The step (4) of establishing the sitting posture recognition model comprises the following specific steps:
1) MIMO-CNN design: the MIMO-CNN takes three-view depth maps (a left view, a top view and a main view) as input and respectively inputs the three views into three branch networks to obtain 3 different feature matrices. And then concat splicing the three feature matrixes of the left view, the top view and the main view in the feature matrix number dimension for the sitting posture feature in the front-back direction. And splicing the two feature matrixes concat of the top view and the front view for the sitting posture feature in the left-right direction. Respectively inputting the spliced two sitting posture state features into two deep sub-network branches, finally outputting two 1-dimensional feature vectors corresponding to the feature vectors in the front-back direction and the left-right direction of the sitting posture by the sub-network branches, and finally performing probability distribution output on the sitting posture vectors in the left-right and front-back states by using 2 softmax layers;
2) training model parameters: and respectively inputting the three-view depth maps into three channels of the MIMO-CNN to obtain model sitting posture information, and then calculating the cross entropy loss between the model sitting posture result and the real label. And continuously updating and optimizing the parameters of the network by using a back propagation gradient descent algorithm according to the loss function to finish network training.
The step 1) MIMO-CNN design comprises the following specific steps:
(a) an original image is input, and a convolution operation is performed first using 3 × 3 kernels, and then performed, BatchNorm normalization, and activation by Relu6 activation functions, resulting in a 112 × 32 feature map.
The calculation process of the convolutional layer is as follows:
Figure BDA0002392543030000051
Figure BDA0002392543030000052
wherein the content of the first and second substances,
Figure BDA0002392543030000053
net activation of the jth channel, called convolutional layer l, by outputting a profile for the previous layer
Figure BDA0002392543030000054
The result of convolution summation and offset is obtained,
Figure BDA0002392543030000055
the output of the jth channel, which is convolution l, f (-) is referred to as the activation function, and the Relu6 function is used herein. MjRepresentation for computing
Figure BDA0002392543030000056
Is used to generate a set of input feature maps,
Figure BDA0002392543030000057
is a matrix of convolution kernels, and is,
Figure BDA0002392543030000058
is the bias to the convolved feature map. For an output profile
Figure BDA0002392543030000059
Each input feature map
Figure BDA00023925430300000510
Corresponding convolution kernel
Figure BDA00023925430300000511
Possibly differently, "' is a convolution symbol.
The Relu6 activation function f (x) is:
f(x)=Min(Max(0,x),6)
data were normalized to a gaussian distribution with mean 0 and variance 1 using BatchNorm after convolution and activation:
Figure BDA0002392543030000061
wherein, XkIs the kth feature map in the feature layer, E (X)k) For obtaining an input profile XkMean value of, Var (X)k) For obtaining a characteristic diagram XkThe variance of (a) is determined,
Figure BDA0002392543030000062
is normalized output;
(b) and carrying out convolution on the feature map after convolution by a CBAM attention convolution module, wherein the CBAM has the main function of leading the network to be more concentrated on the important regions and channels of the feature map in space and communication.
(c) Then, using an inverted Residual Block module to extract features; the method comprises the steps of firstly increasing the dimension of an input feature map by using point-wise constraint, then carrying out BatchNorm algorithm normalization, activating a Relu6 activation function, then carrying out convolution operation by using a depth-wise constraint mode, carrying out BatchNorm algorithm normalization again after operation, carrying out Relu6 function operation, and finally reducing the dimension by using point-wise constraint. Note that at this time, after the last point-wise contribution, after the BatchNorm algorithm normalization, the Relu6 activation function is no longer used, but a linear activation function is used, so as to retain more feature information, ensure the expression capability of the model, and have the idea of Resnet. And (c) after the step a is finished, performing feature extraction by using four inversed residual Block modules, and finally obtaining the feature maps of 14 × 64 of the three views respectively.
(d) In the dimension of the number of the feature matrix, splicing three features concat of 14 × 64 in the left view, the top view and the main view into sitting posture features for 14 × 192 in the front-back direction, and splicing two features concat of the top view and the main view into sitting posture features for 14 × 128 in the left-right direction;
(e) and carrying out convolution on the spliced two features by a CBAM attention convolution module, wherein the CBAM has the main function of leading the network to be more concentrated on the important regions and channels of the feature map in space and communication. After convolution, a characteristic diagram of 14 × 192 in the front-back direction and a sitting posture characteristic of 14 × 128 in the left-right direction are obtained;
(f) respectively carrying out the same operation on the two characteristics after the convolution of the attention convolution module, firstly carrying out three times of inverted Residual Block operation to obtain a characteristic diagram of 7 × 320, then carrying out point-wise fusion to expand the characteristic diagram to obtain a characteristic diagram of 7 × 1280, obtaining a one-dimensional characteristic of 1 × 128 by using average pooling, and finally obtaining a front and back direction sub-network by using point-wise fusion to obtain a one-dimensional characteristic of 1 × 4 and a left and right direction sub-network to obtain a one-dimensional characteristic of 1 × 3;
(g) the method comprises the following steps of respectively outputting probability distribution of sitting posture vectors in left-right and front-back states by using 2 softmax layers:
the operational function of the Softmax layer is as follows:
Figure BDA0002392543030000071
wherein ZjIs the jth input variable, M is the number of input variables,
Figure BDA0002392543030000072
for output, the probability of the output class being j can be represented.
The step 2) of model parameter training comprises the following specific steps:
(a) and respectively inputting the three-view depth maps into three channels of the MIMO-CNN to obtain model sitting posture information, and then calculating the cross entropy loss between the model sitting posture result and the real label.
The cross entropy is calculated as:
Figure BDA0002392543030000081
wherein labeliExpressed as onehot encoded tag, m is the number of samples of batch.
Loss function for this model:
Loss=Lv+Lh+γ∑j|wj 2|
wherein L isvCross entropy, L, output for front-to-back directionhIs the cross entropy of the left and right directions ∑j|wj 2And | is the L2 regular term of the training simulator, and gamma is the coefficient of the regular term, so that the problem of over-fitting training is prevented.
(b) And (3) continuously updating and optimizing the parameters of the network by using a back-propagation gradient descent algorithm, wherein the output of the model is continuously close to the real label, and when the accuracy of the verification set reaches a stable region and is not increased any more, the network training is finished.
The step (6) of model self-learning comprises the following specific steps:
1) in the using process, the situation of error identification found by a user is fed back, the background automatically analyzes the situation, and error samples meeting the conditions are uploaded in a cloud terminal;
2) performing manual secondary judgment on the error sample fed back to the cloud end by the client, and labeling and adding the error sample into a data set;
3) putting the database into the model again for training at regular intervals, and fine-tuning the model after a small amount of iteration;
the invention has the beneficial effects that:
1) the depth image obtained by the depth camera is combined with the preprocessed human body contour map, so that the interference of surrounding background on the sitting posture recognition is eliminated.
2) Three-view depth maps are obtained by using human body contour depth information reconstruction, so that sitting posture information is richer.
3) The MIMO-CNN structure designed by the invention is better suitable for the characteristic extraction of multi-view sitting posture information, has a high-performance attention mechanism and can better focus on areas which need to be focused most in different sitting posture images. Under the condition of reducing the size of the model, the real-time performance and the recognition accuracy are well balanced, and the front and back and the left and right states of the sitting posture can be recognized simultaneously, so that the recognition of the sitting posture state is more accurate. Therefore, the device has strong anti-interference performance on the change of the sitting posture background and the angle of the camera.
4) By adopting a model self-learning mode, the model precision is continuously improved by the feedback of a user error sample in the using process of the user.
Drawings
FIG. 1 is a flow chart of a sitting posture identification method;
FIG. 2 is a depth image acquired by a depth camera;
FIG. 3 is a human foreground profile acquired by a depth camera;
FIG. 4 is a comparison graph before and after histogram equalization of depth images;
FIG. 5 is a comparison graph before and after a depth image histogram opening operation;
FIG. 6 is a comparison graph before and after median filtering of a human foreground contour map;
FIG. 7 is a process diagram for obtaining a human body foreground sitting posture depth image
FIG. 8 is a background adaptive clipping flow diagram;
FIG. 9 is a diagram of the effect after clipping;
FIG. 10 is a three-view depth map of human sitting posture depth;
FIG. 11 is a main structure diagram of the MIMO-CNN.
FIG. 12 is a schematic representation of a CBAM;
FIG. 13 is a schematic view of a CBAM channel attention model;
FIG. 14 is a schematic diagram of a CBAM spatial attention model;
FIG. 15 is a diagram of an Inverted residual module architecture for MIMO-CNN;
FIG. 16 is a sitting posture state classification diagram;
FIG. 17 is a front network framework parameter table;
FIG. 18 is a table of predicted network parameters for a back network in-and-out sitting position;
FIG. 19 is a table of predicted network parameters for a rear network left and right sitting posture;
Detailed Description
A sitting posture identification method based on projection reconstruction and a multiple-input multiple-output neural network (MIMO-CNN) comprises the following steps:
(1) image acquisition: acquiring a depth image and a human body foreground contour map by using a depth camera;
(2) image preprocessing: preprocessing operations such as histogram equalization and filtering are carried out on the obtained depth image and the human body foreground contour map, and data enhancement is carried out on the sample so as to expand a data set for training;
(3) and (3) depth image projection reconstruction: performing projection reconstruction on the human body foreground contour depth map, and sequentially obtaining a left view, a top view and a main view, namely a three-view depth map, by taking the opposite direction of an X, Y, Z axis as a projection direction as shown in fig. 3;
(4) establishing a sitting posture identification model: designing an MIMO-CNN for sitting posture identification, and respectively taking the three-view depth maps processed in the step (3) as the input of three channels of the MIMO (multiple input multiple output) -CNN for network training;
(5) and (3) sitting posture identification: inputting the three-view depth map obtained by preprocessing into the MIMO-CNN as an input quantity, and finally identifying a sitting posture according to the distribution condition of a human body in the space;
(6) self-learning of the model: and self-screening the fed-back error samples, collecting the screened misjudgment samples, and automatically re-learning the model to improve the identification precision of the model.
Example 1
The invention is further described below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.
A sitting posture identification method combining projection information and a multiple input multiple output neural network (MIMO-CNN) is shown in a flow chart of fig. 1. The method comprises the following steps:
step 1, the specific implementation process of image acquisition is as follows:
a1, acquiring a depth image by using a depth camera, as shown in figure 2;
a2, using a random decision forest classification algorithm of an official SDK of a depth camera to finally divide a human body into 32 parts, and finally obtaining a human body foreground contour required by people, as shown in figure 3;
step 2, the image preprocessing process comprises:
b1, histogram equalization is performed on the raw depth image, and histogram equalization is a method for adjusting contrast using an image histogram in the field of image processing.
Figure BDA0002392543030000111
g(x,y)=Sf(x,y)*(L-1)
L is the total number of possible gray levels in the image, the total number of pixels in the original image f is n, f (x, y) is the original pixel value at the position of the original image f (x, y), and g (x, y) is the pixel value after histogram equalization.
In this way, the depth information can be better distributed over the histogram. This can be used to enhance the local contrast without affecting the overall contrast, so that the depth information can more clearly express the distance information, as shown in fig. 4.
B2, performing an opening operation on the human body foreground contour to remove outer edge burrs of the human body foreground and defect blocks inside the contour, wherein the opening operation is composed of corrosion and expansion operation, the opening operation is corrosion first and then expansion, the corrosion operation is to reduce the boundary, the expansion operation is to expand the boundary, the method is the opening operation performed by using a 5 x 5 kernel, and a comparison graph is shown in FIG. 5.
B3, denoising the human body foreground contour by using a median filtering method, setting the gray value of each pixel point as the median of the gray values of all pixel points in the neighborhood window of the point 8, and obtaining the human body foreground contour with smooth edge after denoising, wherein the comparison graph is shown in fig. 6.
B4, based on the human body foreground contour, using the contour as coordinate to grab the human body depth information of the depth image containing the background, and forming the required human body depth sitting posture data, obtaining the human body depth foreground contour map, as shown in fig. 7.
The formula for obtaining the human body foreground depth image is as follows:
Figure BDA0002392543030000121
x∈[0,rows-1],y∈[0,cols-1]
in the formula, F (x, y) is the human body foreground depth image generated finally, G (x, y) is the depth image, and D (x, y) is the human body foreground contour image.
B5, cutting off the redundant white background by adaptive cutting, because the pure white background is still too large and is meaningless for the whole recognition, automatically cutting to obtain the optimal picture feature by traversing the pixel values of rows or columns up and down, left and right, and the adaptive cutting flow chart is as shown in fig. 8, and the cutting effect chart is as shown in fig. 9.
B6, randomly cutting the three-view depth map by the length and width of 0.85 times; randomly overturning left and right main views of a straight sitting posture, a lying desk sitting posture, a backward sitting posture and a low-head sitting posture which are centered in the left and right directions, namely, overturning left and right at a probability of 50%; data enhancement is performed to expand the training samples.
B7, performing bilinear interpolation on the extended samples to normalize the samples to 224 × 224.
And step 3, the process of depth image projection reconstruction is as follows:
and C1, regarding the human body foreground depth map in the step 2, the upper left corner is the origin of coordinates, the right side is the positive direction of an X axis, the downward side is the positive direction of a Y axis, and the pixel value is the direction of a Z axis, so that the pixel points of one depth image can be seen as three-dimensional points. And (3) converting the original Z axis into the Y axis, converting the Y axis into the Z axis, and normalizing the data from 0-224 to 0-255 to obtain the top-view projection image of the human body foreground depth image.
And C2, converting the original Z axis of the human body foreground depth map in the step 2 into an X axis, converting the X axis into the Z axis, and normalizing the data from 0-224 to 0-255 to obtain a left-view projection map of the human body foreground depth map.
C3, performing bilinear interpolation size normalization on the left view and the top view of the human body foreground depth map to 224 × 224, and collectively referring to the human body foreground depth map processed in step 2 as a three-view depth map, as shown in fig. 10.
Step 4, the process of establishing the sitting posture identification model comprises the following steps:
d1, the MIMO-CNN takes the three-view depth maps (left view, top view and main view) as input, and inputs them into three branch networks respectively, so as to obtain 3 different feature matrices. And then concat splicing the three feature matrixes of the left view, the top view and the main view in the feature matrix number dimension for the sitting posture feature in the front-back direction. And splicing the two feature matrixes concat of the top view and the front view for the sitting posture feature in the left-right direction. And inputting the spliced two sitting posture state features into two deep sub-network branches respectively, outputting two 1-dimensional feature vectors by the sub-network branches finally, respectively corresponding to the feature vectors in the front-back direction and the left-right direction of the sitting posture, and finally performing probability distribution output on the sitting posture vectors in the left-right and front-back states by using 2 softmax layers, wherein the main structure diagram of the MIMO-CNN is shown in FIG. 11, and the structure is a design framework of the MIMO-CNN.
And D2, the model parameter training process comprises the steps of firstly respectively inputting the three-view depth maps into three channels of the MIMO-CNN to obtain model sitting posture information, and then calculating the cross entropy loss between the model sitting posture result and the real label. And continuously updating and optimizing the parameters of the network by using a back propagation gradient descent algorithm according to the loss function to finish network training.
The step D1, the design of MIMO-CNN, includes the following specific steps:
(a) inputting a certain original three-view depth image, firstly, performing convolution layer operation on three shallow sub-network branches by using 3 × 3 kernels respectively, then performing BatchNorm normalization on the three shallow sub-network branches, and activating by using a Relu6 activation function to obtain a 112 × 32 feature map.
The calculation process of the convolutional layer is as follows:
Figure BDA0002392543030000141
Figure BDA0002392543030000142
wherein the content of the first and second substances,
Figure BDA0002392543030000143
net activation of the jth channel, called convolutional layer l, by outputting a profile for the previous layer
Figure BDA0002392543030000144
The result of convolution summation and offset is obtained,
Figure BDA0002392543030000145
the output of the jth channel, which is convolution l, f (-) is referred to as the activation function, and the Relu6 function is used herein. MjRepresentation for computing
Figure BDA0002392543030000146
Is used to generate a set of input feature maps,
Figure BDA0002392543030000147
is a matrix of convolution kernels, and is,
Figure BDA0002392543030000148
is the bias to the convolved feature map. For an output profile
Figure BDA0002392543030000149
Each input feature map
Figure BDA00023925430300001410
Corresponding convolution kernel
Figure BDA00023925430300001411
Possibly differently, "' is a convolution symbol.
The Relu6 activation function f (x) is:
f(x)=Min(Max(0,x),6)
data were normalized to a gaussian distribution with mean 0 and variance 1 using BatchNorm after convolution and activation:
Figure BDA0002392543030000151
wherein, XkIs the kth feature map in the feature layer, E (X)k) For obtaining an input profile XkMean value of, Var (X)k) For obtaining a characteristic diagram XkThe variance of (a) is determined,
Figure BDA0002392543030000152
is normalized output;
(b) and (4) performing convolution on the feature map after convolution by a CBAM attention convolution module to obtain a feature map of 112 × 32. The main role of CBAM is to focus the network more on the important feature areas and network critical channels, and the schematic diagram of CBAM is shown in fig. 12.
The process of the CBAM attention convolution module is as follows:
firstly, performing spatial max and avg pooling on an input feature map (112 × 32), then performing multi-layer perceptron calculation, finally adding two one-dimensional vectors, finally performing relu activation function activation on the added one-dimensional vectors to obtain a1 × 32 channel attention model, and multiplying the model and the input feature map to obtain a 112 × 32 feature map, wherein the channel attention model of the CBAM is shown in fig. 13.
The channel attention model formula described is:
Mc(F)=σ(MLP(Avgpool(F)))+σ(MLP(Maxpool(F))))
f is an input feature diagram, Mc is a channel attention model sigma and represents a relu activation function, MLP represents a multilayer perceptron, AvgPool is a spatial average pooling operation, and MaxPool is a spatial maximum pooling operation.
And then, performing max and avg channel pooling on the feature map obtained after the previous step of operation, then performing concat channel splicing, performing convolution operation of 7 × 7 on the spliced module to obtain a module (112 × 1), finally performing sigmoid activation function activation, and multiplying the model and the input feature map to obtain a feature map of 112 × 32, namely a final feature map. The channel attention model of CBAM is shown in fig. 14.
The spatial attention model formula described is:
Ms(F)=σ(f7*7([Avgpool(F));Maxpool(F)]))
f is an input feature map, Ms is a spatial attention model sigma representing sigmoid activation function, AvgPool is a spatial average pooling operation, MaxPool is a spatial maximum pooling operation, F7*7The convolution operation is 7 by 7.
(c) After step b is finished, the feature extraction is performed by using an inverted Residual Block module proposed by four MobilenetV2 networks, and finally, feature maps of 14 × 64 of three views are obtained, and parameters of the front network are shown in fig. 17. An inverted Residual Block module structure, as shown in FIG. 15; the dimension of the input feature map is firstly enlarged by using point-wise constraint, then the BatchNorm algorithm normalization is carried out, activation is carried out on a Relu6 activation function, then convolution operation is carried out in a depth-wise constraint mode, the BatchNorm algorithm normalization is carried out again after operation, the Relu6 function operation is carried out, and finally the dimension is reduced by using point-wise constraint. Note that at this time, after the last point-wise contribution, after the BatchNorm algorithm normalization, the Relu6 activation function is no longer used, but a linear activation function is used, so as to retain more feature information, ensure the expression capability of the model, and have the idea of Resnet.
(d) In the dimension of the number of the feature matrix, splicing three features concat of 14 × 64 in the left view, the top view and the main view into sitting posture features for 14 × 192 in the front-back direction, and splicing two features concat of the top view and the main view into sitting posture features for 14 × 128 in the left-right direction;
(e) and respectively inputting the spliced two features into two branches of the deep sub-network to continue carrying out convolution by the CBAM attention convolution module. After convolution, a characteristic diagram of 14 × 192 in the front-back direction and a sitting posture characteristic of 14 × 128 in the left-right direction are obtained;
(e) respectively carrying out the same operation on the two features after the convolution of the attention convolution module, firstly carrying out three times of inverted Residual Block operation to obtain a feature map of 7 × 320, then carrying out point-wise containment to expand the feature map to obtain a feature map of 7 × 1280, obtaining one-dimensional features of 1 × 128 by using average pooling, and finally obtaining one-dimensional features of 1 × 4 of the front and back sub-network branches and one-dimensional features of 1 × 3 of the left and right sub-network branches by using point-wise containment, wherein the predicted network parameters of the front and back and left and right sitting posture states of the deep sub-network branches are shown in a graph 18/19;
(f) the method comprises the following steps of respectively outputting probability distribution of sitting posture vectors in left-right and front-back states by using 2 softmax layers:
the operational function of the Softmax layer is as follows:
Figure BDA0002392543030000171
wherein ZjIs the jth input variable, M is the number of input variables,
Figure BDA0002392543030000172
for output, the probability of the output class being j can be represented.
The step D2, model training, comprises the following specific steps:
(a) and respectively inputting the three-view depth maps into three channels of the MIMO-CNN to obtain model sitting posture information, and then calculating the cross entropy loss between the model sitting posture result and the real label.
The cross entropy is calculated as:
Figure BDA0002392543030000181
wherein labeliIs denoted by onehot encoded tag, m is the number of samples of batch.
Loss function for this model:
Loss=Lv+Lh+γ∑j|wj 2|
wherein L isvCross entropy, L, output for front-to-back directionhIs the cross entropy of the left and right directions ∑j|wj 2And | is the L2 regular term of the training simulator, and gamma is the coefficient of the regular term, so that the problem of over-fitting training is prevented.
(b) And (3) continuously updating and optimizing the parameters of the network by using a back-propagation gradient descent algorithm, wherein the output of the model is continuously close to the real label, and when the accuracy of the verification set reaches a stable region and is not increased any more, the network training is finished.
Step 5, sitting posture identification: and (4) inputting the three-view depth map obtained by preprocessing as an input quantity into the sitting posture recognition model trained in the step 4 to recognize the sitting posture. The sitting posture can be divided into forward sitting, head lowering, backward leaning and lying down tables and left and right directions, namely left leaning, middle leaning and right leaning. Considering that the left and right direction judgment is not carried out any more in the particularity of sitting postures of the lying table, the sitting postures are classified as shown in fig. 16;
step 6, model self-learning is automatically carried out along with feedback collection of error samples, and the performance of the model is improved, and the method comprises the following specific steps:
and E1, in the using process, the user finds the situation of wrong identification, feeds back the situation, the background analyzes the probability distribution of the output of the softmax layer which identifies the wrong identification, and if the predicted judgment result probability value is lower than 0.65, the classification is regarded as fuzzy classification, namely correct judgment cannot be made in certain two similar classifications, and the classification needs to be fed back to the cloud as a wrong sample for learning. In the recognition process, under the condition that the model is stable, the prediction output of the approximate value is almost not wrong, so the error is ignored by the background;
e2, performing manual secondary judgment on the error sample fed back to the cloud end by the client, labeling and adding the error sample into the data set;
e3, putting the database into the model again for training, and fine-tuning the model after a small amount of iteration;
the above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.

Claims (8)

1. The sitting posture identification method based on projection reconstruction and the multi-input multi-output neural network is characterized by comprising the following steps of:
(1) image acquisition: acquiring a depth image and a human body foreground contour map by using a depth camera;
(2) image preprocessing: carrying out histogram equalization and filtering preprocessing operation on the obtained depth image and the human body foreground contour map, and carrying out data enhancement on the sample so as to expand a data set for training;
(3) and (3) depth image projection reconstruction: carrying out projection reconstruction on the human body foreground contour depth map, and respectively taking the opposite direction of an X, Y, Z axis as a projection direction to sequentially obtain a left view, a top view and a main view, namely a three-view depth map;
(4) establishing a sitting posture identification model: designing a multi-input multi-output neural network for sitting posture identification, and respectively taking the three-view depth maps processed in the step (3) as the input of three channels of the multi-input multi-output neural network for network training;
(5) and (3) sitting posture identification: inputting the three-view depth map obtained by preprocessing as an input quantity into a multi-input multi-output neural network, and finally identifying a sitting posture according to the distribution condition of a human body in a space;
(6) self-learning of the model: and self-screening the fed-back error samples, collecting the screened misjudgment samples, and automatically re-learning the model to improve the identification precision of the model.
2. The sitting posture identifying method based on projection reconstruction and multiple-input multiple-output neural network as claimed in claim 1, wherein the step (1) of image acquisition comprises the following specific steps:
1) acquiring a depth image by using a depth camera;
2) and acquiring the human body foreground contour by using a built-in algorithm of a depth camera.
3. The sitting posture identifying method based on projection reconstruction and multiple-input multiple-output neural network as claimed in claim 1, wherein the step (2) of image preprocessing comprises the following specific steps:
1) carrying out histogram equalization on the native depth image;
2) performing opening operation on the depth image, and removing outer edge burrs and missing blocks inside the outline;
3) denoising the human body foreground contour by adopting a median filtering method;
4) the human body foreground contour is taken as a template, and human body depth information is captured from the depth image to obtain a human body foreground depth map;
5) cutting off redundant white background of the human body foreground depth map through self-adaptive cutting;
7) bilinear interpolation dimensions are normalized to 224 x 224;
8) and performing data enhancement on the cut human body foreground depth map to expand a training sample.
4. The sitting posture identifying method based on projection reconstruction and multiple-input multiple-output neural network as claimed in claim 1, wherein the step (3) of depth image projection reconstruction comprises the following specific steps:
1) regarding the human body foreground depth map preprocessed in the step (2), the upper left corner is the origin of coordinates, the right side is the positive direction of an X axis, the downward side is the positive direction of a Y axis, the pixel value is the direction of a Z axis, the pixel points of one depth map are regarded as three-dimensional points, the original Z axis is converted into the Y axis, the original Y axis is converted into the Z axis, and the data are normalized from 0-224 to 0-255, so that the overlooking projection map of the human body foreground depth map is obtained;
2) transforming the original Z axis of the human body foreground depth map preprocessed in the step (2) into an X axis, transforming the X axis into the Z axis, and normalizing the data from 0-224 to 0-255 to obtain a left-view projection map of the human body foreground depth map;
3) and (3) carrying out bilinear interpolation on the left view and the top view of the human body foreground depth map to normalize the sizes to 224 x 224, and collectively referring to the human body foreground depth map preprocessed in the step (2) as a three-view depth map.
5. The sitting posture recognition method based on projection reconstruction and multiple-input multiple-output neural network as claimed in claim 1, wherein the step (4) of establishing a sitting posture recognition model comprises the following specific steps:
1) designing a multi-input multi-output neural network: the multi-input multi-output neural network takes a left view, a top view and a main view three-view depth map as input and respectively inputs the input into three branch networks to obtain 3 different feature matrixes; secondly, concat splicing the three feature matrixes of the left view, the top view and the main view on the dimension of the number of the feature matrixes, and using the three feature matrixes for sitting posture state features in the front-back direction; splicing the two feature matrixes concat of the top view and the main view to be used for sitting posture features in the left and right directions; respectively inputting the spliced two sitting posture state features into two deep sub-network branches, finally outputting two 1-dimensional feature vectors corresponding to the feature vectors in the front-back direction and the left-right direction of the sitting posture by the sub-network branches, and finally performing probability distribution output on the sitting posture vectors in the left-right and front-back states by using 2 softmax layers;
2) training model parameters: inputting the three-view depth maps into three channels of a multi-input multi-output neural network respectively to obtain model sitting posture information, and calculating cross entropy loss between a model sitting posture result and a real label; and continuously updating and optimizing the parameters of the network by using a back propagation gradient descent algorithm according to the loss function to finish network training.
6. The sitting posture identifying method based on projection reconstruction and multiple-input multiple-output neural network as claimed in claim 5, wherein the step 1) multiple-input multiple-output neural network design comprises the following specific steps:
(a) inputting a certain original image, firstly performing convolution layer operation by using 3 × 3 kernels, and then performing BatchNorm normalization and Relu6 activation on the original image to obtain a characteristic map of 112 × 32;
the calculation process of the convolutional layer is as follows:
Figure FDA0002392543020000041
Figure FDA0002392543020000042
wherein the content of the first and second substances,
Figure FDA0002392543020000043
net activation of the jth channel, called convolutional layer l, by outputting a profile for the previous layer
Figure FDA0002392543020000044
The result of convolution summation and offset is obtained,
Figure FDA0002392543020000045
f (-) is the output of the jth channel of convolution l called the activation function, using the Relu6 function; mjRepresentation for computing
Figure FDA0002392543020000046
Is used to generate a set of input feature maps,
Figure FDA0002392543020000047
is a matrix of convolution kernels, and is,
Figure FDA0002392543020000048
is the bias to the convolved feature map; for an output profile
Figure FDA0002392543020000049
Each input feature map
Figure FDA00023925430200000410
Corresponding convolution kernel
Figure FDA00023925430200000411
Possibly differently, "' is a convolution symbol;
the Relu6 activation function f (x) is:
f(x)=Min(Max(0,x),6)
data were normalized to a gaussian distribution with mean 0 and variance 1 using BatchNorm after convolution and activation:
Figure FDA00023925430200000412
wherein, XkIs the kth feature map in the feature layer, E (X)k) For obtaining an input profile XkMean value of, Var (X)k) For obtaining a characteristic diagram XkThe variance of (a) is determined,
Figure FDA00023925430200000413
is normalized output;
(b) carrying out convolution on the feature map subjected to convolution by a CBAM attention convolution module, wherein the CBAM has the main function of enabling a network to be more concentrated on important feature areas and network key channels;
(c) then, using an inverted Residual Block module to extract features; the method comprises the steps that firstly, the dimension of an input feature map is enlarged by point-wise constraint, then BatchNorm algorithm normalization is carried out, activation is carried out on a Relu6 activation function, convolution operation is carried out in a depth-wise constraint mode, after operation, BatchNorm algorithm normalization and Relu6 function operation are carried out again, and finally the dimension is reduced by point-wise constraint; at the moment, after the last point-wise contribution, the value norm algorithm is normalized, and then a Relu6 activating function is not used, but a linear activating function is used, so that more characteristic information is reserved, the expression capability of the model is ensured, and the concept of Resnet is realized; after the step (a) is finished, performing feature extraction by using four inverted residual Block modules to finally obtain feature maps of 14 × 64 of the three views respectively;
(d) in the dimension of the number of the feature matrix, splicing three features concat of 14 × 64 in the left view, the top view and the main view into sitting posture features for 14 × 192 in the front-back direction, and splicing two features concat of the top view and the main view into sitting posture features for 14 × 128 in the left-right direction;
(e) performing convolution on the spliced two features by using a CBAM (CBAM) attention convolution module; after convolution, a characteristic diagram of 14 × 192 in the front-back direction and a sitting posture characteristic of 14 × 128 in the left-right direction are obtained;
(f) respectively carrying out the same operation on the two characteristics after the convolution of the attention convolution module, firstly carrying out three times of inverted Residual Block operation to obtain characteristic diagrams of 7 × 320, then carrying out point-wise fusion to expand the characteristic diagrams to obtain characteristic diagrams of 7 × 1280, obtaining one-dimensional characteristics of 1 × 128 by using average pooling, and finally obtaining the one-dimensional characteristics of 1 × 4 by using the sub-networks in the front and back directions and the one-dimensional characteristics of 1 × 3 by using the sub-networks in the left and right directions;
(g) the method comprises the following steps of respectively outputting probability distribution of sitting posture vectors in left-right and front-back states by using 2 softmax layers:
the operational function of the Softmax layer is as follows:
Figure FDA0002392543020000061
wherein ZjIs the jth input variable, M is the number of input variables,
Figure FDA0002392543020000062
for output, the probability of the output class being j can be represented.
7. The sitting posture recognition method based on projection reconstruction and multiple-input multiple-output neural network as claimed in claim 5, wherein the step 2) model parameter training comprises the following specific steps:
(a) inputting the three-view depth maps into three channels of a multi-input multi-output neural network respectively to obtain model sitting posture information, and then calculating the cross entropy loss between the model sitting posture result and a real label, wherein the calculation formula of the cross entropy is as follows:
Figure FDA0002392543020000063
wherein labeliExpressed as onehot encoded tag, m is the number of samples of batch;
loss function of this model:
Loss=Lv+Lh+γ∑j|wj 2|
wherein L isvCross entropy, L, output for front-to-back directionhIs the cross entropy of the left and right directions ∑j|wj 2I is the L2 regular term of the training vector, | is the coefficient of the regular term, so as to prevent the problem of over-fitting of the training;
(b) and (3) continuously updating and optimizing the parameters of the network by using a back-propagation gradient descent algorithm, wherein the output of the model is continuously close to the real label, and when the accuracy of the verification set reaches a stable region and is not increased any more, the network training is finished.
8. The sitting posture identifying method based on projection reconstruction and multiple-input multiple-output neural network as claimed in claim 1, wherein the step (6) of model self-learning comprises the following specific steps:
1) in the using process, the situation of error identification found by a user is fed back, the background automatically analyzes the situation, and error samples meeting the conditions are uploaded in a cloud terminal;
2) performing manual secondary judgment on the error sample fed back to the cloud end by the client, and labeling and adding the error sample into a data set;
3) and (4) regularly putting the database into the model again for training, and performing fine tuning on the model after a small amount of iteration.
CN202010119569.5A 2020-02-26 2020-02-26 Sitting posture identification method based on projection reconstruction and MIMO neural network Active CN111325166B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010119569.5A CN111325166B (en) 2020-02-26 2020-02-26 Sitting posture identification method based on projection reconstruction and MIMO neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010119569.5A CN111325166B (en) 2020-02-26 2020-02-26 Sitting posture identification method based on projection reconstruction and MIMO neural network

Publications (2)

Publication Number Publication Date
CN111325166A true CN111325166A (en) 2020-06-23
CN111325166B CN111325166B (en) 2023-07-07

Family

ID=71172893

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010119569.5A Active CN111325166B (en) 2020-02-26 2020-02-26 Sitting posture identification method based on projection reconstruction and MIMO neural network

Country Status (1)

Country Link
CN (1) CN111325166B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112348947A (en) * 2021-01-07 2021-02-09 南京理工大学智能计算成像研究院有限公司 Three-dimensional reconstruction method for deep learning based on reference information assistance
CN112507920A (en) * 2020-12-16 2021-03-16 重庆交通大学 Examination abnormal behavior identification method based on time displacement and attention mechanism
CN113657271A (en) * 2021-08-17 2021-11-16 上海科技大学 Sitting posture detection method and system combining quantifiable factors and non-quantifiable factors for judgment
CN114582014A (en) * 2022-01-25 2022-06-03 珠海视熙科技有限公司 Method and device for recognizing human body sitting posture in depth image and storage medium
CN114898463A (en) * 2022-05-09 2022-08-12 河海大学 Sitting posture identification method based on improved depth residual error network
CN114898463B (en) * 2022-05-09 2024-05-14 河海大学 Sitting posture identification method based on improved depth residual error network

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107622257A (en) * 2017-10-13 2018-01-23 深圳市未来媒体技术研究院 A kind of neural network training method and three-dimension gesture Attitude estimation method
CN108345869A (en) * 2018-03-09 2018-07-31 南京理工大学 Driver's gesture recognition method based on depth image and virtual data
CN108491880A (en) * 2018-03-23 2018-09-04 西安电子科技大学 Object classification based on neural network and position and orientation estimation method
CN110175566A (en) * 2019-05-27 2019-08-27 大连理工大学 A kind of hand gestures estimating system and method based on RGBD converged network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107622257A (en) * 2017-10-13 2018-01-23 深圳市未来媒体技术研究院 A kind of neural network training method and three-dimension gesture Attitude estimation method
CN108345869A (en) * 2018-03-09 2018-07-31 南京理工大学 Driver's gesture recognition method based on depth image and virtual data
CN108491880A (en) * 2018-03-23 2018-09-04 西安电子科技大学 Object classification based on neural network and position and orientation estimation method
CN110175566A (en) * 2019-05-27 2019-08-27 大连理工大学 A kind of hand gestures estimating system and method based on RGBD converged network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
BIN LIANG,ETC: "Three Dimensional Motion Trail Model for Gesture Recognition" *
袁迪波等: "不规范书写坐姿的多类特征融合与识别" *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112507920A (en) * 2020-12-16 2021-03-16 重庆交通大学 Examination abnormal behavior identification method based on time displacement and attention mechanism
CN112348947A (en) * 2021-01-07 2021-02-09 南京理工大学智能计算成像研究院有限公司 Three-dimensional reconstruction method for deep learning based on reference information assistance
CN112348947B (en) * 2021-01-07 2021-04-09 南京理工大学智能计算成像研究院有限公司 Three-dimensional reconstruction method for deep learning based on reference information assistance
CN113657271A (en) * 2021-08-17 2021-11-16 上海科技大学 Sitting posture detection method and system combining quantifiable factors and non-quantifiable factors for judgment
CN113657271B (en) * 2021-08-17 2023-10-03 上海科技大学 Sitting posture detection method and system combining quantifiable factors and unquantifiable factor judgment
CN114582014A (en) * 2022-01-25 2022-06-03 珠海视熙科技有限公司 Method and device for recognizing human body sitting posture in depth image and storage medium
CN114898463A (en) * 2022-05-09 2022-08-12 河海大学 Sitting posture identification method based on improved depth residual error network
CN114898463B (en) * 2022-05-09 2024-05-14 河海大学 Sitting posture identification method based on improved depth residual error network

Also Published As

Publication number Publication date
CN111325166B (en) 2023-07-07

Similar Documents

Publication Publication Date Title
CN111325166A (en) Sitting posture identification method based on projection reconstruction and multi-input multi-output neural network
Rao et al. Deep convolutional neural networks for sign language recognition
CN111652827B (en) Front face synthesis method and system based on generation countermeasure network
CN107145889B (en) Target identification method based on double CNN network with RoI pooling
US6185337B1 (en) System and method for image recognition
CN110738161A (en) face image correction method based on improved generation type confrontation network
CN107871101A (en) A kind of method for detecting human face and device
CN111507334B (en) Instance segmentation method based on key points
CN115439857B (en) Inclined character recognition method based on complex background image
CN109766898A (en) Image character recognition method, device, computer equipment and storage medium
CN110728183A (en) Human body action recognition method based on attention mechanism neural network
CN111695633A (en) Low-illumination target detection method based on RPF-CAM
CN105046202B (en) Adaptive recognition of face lighting process method
CN115457568B (en) Historical document image noise reduction method and system based on generation countermeasure network
CN111783693A (en) Intelligent identification method of fruit and vegetable picking robot
Kishore et al. Selfie sign language recognition with convolutional neural networks
CN111414875A (en) Three-dimensional point cloud head attitude estimation system based on depth regression forest
CN112149521A (en) Palm print ROI extraction and enhancement method based on multitask convolutional neural network
CN110334631B (en) Sitting posture detection method based on face detection and binary operation
CN108052932A (en) One kind blocks adaptive face identification method
CN111179294A (en) Bionic type contour detection method based on X, Y parallel visual channel response
CN111898560B (en) Classification regression feature decoupling method in target detection
CN114596605A (en) Expression recognition method with multi-feature fusion
CN112926430A (en) Multi-angle facial expression recognition method based on deep learning
CN111611963A (en) Face recognition method based on neighbor preserving canonical correlation analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant