CN115984384A

CN115984384A - Desktop lifting control method based on facial posture image estimation

Info

Publication number: CN115984384A
Application number: CN202310265134.5A
Authority: CN
Inventors: 项乐宏; 夏银水; 李裕麒; 王翀; 蓝艇
Original assignee: Loctek Ergonomic Technology Co Ltd
Current assignee: Loctek Ergonomic Technology Co Ltd
Priority date: 2023-03-20
Filing date: 2023-03-20
Publication date: 2023-04-18
Anticipated expiration: 2043-03-20
Also published as: CN115984384B

Abstract

The invention provides a desktop lifting control method based on facial posture image estimation, which comprises the following steps: acquiring a color image of a head with a shielding part and a depth image of a head with a cavity according to a user image; inputting the color image with the shielding head and the depth image with the cavity head into a human face image fusion denoising module, and denoising to obtain a color image with the shielding head and a depth image with the cavity head; the detection module obtains a two-dimensional thermodynamic diagram; converting the two-dimensional thermodynamic diagram into two-dimensional coordinates of key points through an attitude estimation module; combining the two-dimensional coordinates of the key points with the depopulation head depth image to obtain depth information of the two-dimensional coordinates of the key points, and inputting the depth information into a multi-layer perceptron network to obtain a head posture image and three-dimensional coordinates of the key points; comparing data between a pre-stored three-dimensional coordinate of a key point arranged in a control end and a desktop state to obtain correction data; and the control end controls the desktop state according to the correction data. The embodiment of the invention can intelligently control the desktop state.

Description

Desktop lifting control method based on facial posture image estimation

Technical Field

The invention relates to the technical field of image data processing, in particular to a desktop lifting control method based on facial posture image estimation.

Background

Since ancient times, the labor work of people is a sport which can exercise the constitution of people. However, with the rapid development of modern information technology, internet technology has been popularized in all the world and all the fields, more and more people start to work while sitting, and once sitting, the whole day, the traditional working platform is actually a desk, the height of the desk cannot be adjusted generally, and many desks are standard for purchasing, and the desk has uniform height, so the universality is poor, and the specific requirements of users with different heights cannot be met. If the height of the desk is too high or too low, the user will feel uncomfortable more or less, such as soreness in the back and waist or arms, etc., which may adversely affect the health of the user in long-term use. However, in most cases, the user has to endure the desk with height not matched with the height of the user for a long time, and the height of the chair can be adjusted at most; if the height of the chair can not be adjusted, only one cushion can be added on the chair. This approach can be improved to some extent, but the improvement is limited. In recent years, due to the improvement of living standard and the improvement of living quality requirement of people, office tables which can be lifted manually are developed from fixed type office tables. The existing office table with manual control of lifting is generally provided with a control panel, and an up/down key is arranged on the control panel. As long as the up/down button is pressed, the electric push rod can drive the desktop to ascend or descend within a certain range so as to better meet the requirements of users. Although the principle of the office table is simple, the applicability is strong, the office table can meet the requirements of different people on the use height of the office table, and all the people are easy to use as if the office table can be lifted. However, such office tables with manual control of lifting and lowering only can control a perceived proper height for the use habit of general people, but cannot determine whether the height at this time is a scientific height which is favorable for the health of the general people, and cannot ensure whether the eye using distance is within a reasonable range, so that improvement is needed.

According to the research of human engineering, the height of a desktop and the eye distance required to ensure good sitting posture of a user are different according to different behaviors of the user, and the problem is perfectly solved by task self-adaptive intelligent lifting based on human face posture estimation. The system shoots the behaviors of the user in real time through the camera by establishing the neural network model and using the human face posture estimation and behavior recognition algorithm, reminds the user whether to lift or adjust the inclination of the desktop when the head posture of the user is changed, and lifts the desktop to a favorable height according with the health of the user according to human engineering, so that the system is convenient to use, more intelligent and beneficial to the health and comfort of the user. Meanwhile, as the user may wear masks and other shielding objects at work, most of the human face and key points are shielded, and accurate detection cannot be completed.

Disclosure of Invention

Therefore, the embodiment of the invention provides a desktop lifting control method based on facial posture image estimation, which realizes intelligent control on the desktop state.

In order to solve the above problems, the present invention provides a desktop lifting control method based on facial pose image estimation, comprising: step S100: acquiring a user image in real time through front-end image acquisition equipment; step S200: acquiring a color image of the head with the shielding part and a depth image of the head with the empty cavity according to the user image; step S300: inputting the color image with the shielding head and the depth image with the cavity head into a human face image fusion denoising module, and denoising to obtain a color image with the shielding head and a depth image with the cavity head; step S400: sending the image of the shielding-removed head part color and the image of the cavity-removed head part depth to a detection module, and obtaining a two-dimensional thermodynamic diagram through the detection module; step S500: sending the two-dimensional thermodynamic diagram to an attitude estimation module, and converting the two-dimensional thermodynamic diagram into two-dimensional coordinates of key points through the attitude estimation module; step S600: combining the two-dimensional coordinates of the key points with the depopulated head depth image to obtain depth information of the two-dimensional coordinates of the key points, and inputting the depth information into a multi-layer perceptron network to obtain a head posture image and three-dimensional coordinates of the key points; step S700: sending the head posture image and the three-dimensional coordinates of the key points to a control end, and comparing the three-dimensional coordinates of the pre-stored key points built in the control end with data between desktop states to obtain correction data; step S800: and the control terminal controls the desktop state according to the correction data.

Compared with the prior art, the technical scheme has the following technical effects: along with the development of the internet, more and more indoor office sedentary occupations appear, and often just sit a whole day, traditional desk all is fixed height, or can adjust the height of desktop through the lift of table foot but can only control the suitable height that general people's use habit can only feel to oneself, but can't confirm whether height this moment is for the scientific height that accords with the healthy advantage of oneself, also can't guarantee whether the distance is all in a reasonable scope with the eye. Therefore, the method includes the steps that a front-end image acquisition device acquires a user image in real time, the state of the desktop is adjusted through the user image, the height of the desktop is adjusted to be in accordance with the scientific height of a user, specifically, according to the user image, a color image with a shielding head and a depth image with a cavity head are acquired, the color image with the shielding head and the depth image with the cavity head are input to a human face image fusion denoising module, denoising is carried out to obtain a color image with the shielding head and a depth image with the cavity head, the color image with the shielding head and the depth image with the cavity head are sent to a detection module, a two-dimensional thermodynamic diagram is obtained through the detection module, the two-dimensional thermodynamic diagram is sent to a posture estimation module, the two-dimensional thermodynamic diagram is converted into two-dimensional coordinates of key points through the posture estimation module, then the two-dimensional coordinates of the key points and the depth image with the cavity head are combined to obtain depth information of the two-dimensional coordinates of the key points, a multi-layer perceptron network is input to obtain three-dimensional coordinates of the head posture image and the three-dimensional coordinates of the key points, finally, the three-dimensional coordinates of the head image and the three-dimensional coordinates of the key points are sent to a control terminal, and the posture of the desktop is automatically converted data of the desktop is corrected through the desktop, so that the desktop state of the desktop is corrected data is realized.

In an example of the present invention, the step S200 specifically includes: step S210: the human face image fusion denoising module main body uses a transformer to respectively block the color image with the shielding head part and the depth image with the empty cavity head part and uses different linear mapping layers to obtain a first color image feature vector and a first depth image feature vector; step S220: fusing image characteristics of the color image with the shielding head and the depth image with the cavity head through a local self-attention mechanism, inputting the fused image characteristics into an encoder module by combining position coding information of the color image with the shielding head and the depth image with the cavity head, and separating a second color image characteristic vector and a second depth image characteristic vector; step S230: sending the second color image feature vector to a color image decoder module, and sending the second depth image feature vector to a depth image decoder module; step S240: and combining the first color image feature vector and the second color image feature vector to obtain the de-occluded head color image, and combining the first depth image feature vector and the second depth image feature vector to obtain the de-voided head depth image.

Compared with the prior art, the technical effect achieved by adopting the technical scheme is as follows: a transform is used for a face image fusion denoising module main body, a color image with a shielding head and a depth image with a cavity head are respectively blocked, different linear mapping layers are used for obtaining a first color image feature vector and a first depth image feature vector, a second color image feature vector and a second depth image feature vector are separated through a local self-attention mechanism, a color image with the shielding head is obtained through combination of the first color image feature vector and the second color image feature vector, and the first depth image feature vector and the second depth image feature vector are combined to obtain a depth image with the cavity head, so that the accuracy of the obtained color image with the shielding head and the depth image with the cavity head is improved.

In one example of the present invention, the method further comprises: the training method of the face image fusion denoising module specifically comprises the following steps: and carrying out occlusion processing on the image through a random mask to generate the color image of the head with the occlusion and the depth image of the head with the void, and carrying out the steps S210-S240 through the generated color image of the head with the occlusion and the depth image of the head with the void, thereby realizing training.

Compared with the prior art, the technical scheme has the following technical effects: the random mask shielding processing is carried out on the image through the training method of the face image fusion denoising module, so that the face image fusion denoising module can deal with user images under shielding of various different states, the situation that the desktop state cannot be adjusted when the user images are shielded is prevented, and the accuracy of the face image fusion denoising module for processing the user images is improved.

In one example of the present invention, the body of the pose estimation module is composed of an asymmetric encoder module-decoder module, and uses a masked autoencoder architecture; wherein the encoder module and the decoder module both use swin transformers and the decoder module uses fully-connected layers.

Compared with the prior art, the technical scheme has the following technical effects: the main body of the attitude estimation module is formed by an asymmetric encoder module and a decoder module, the encoder module and the decoder module both use a swin transformer, and the decoder module uses a full connection layer, so that the analysis accuracy of the attitude estimation module is improved.

In one example of the present invention, the method further comprises: the attitude estimation module training method specifically comprises the following steps: inputting a two-dimensional coordinate point thermodynamic diagram, deleting some thermodynamic diagram areas by adopting a random shielding mode, inputting the reserved thermodynamic diagram into the mask self-encoder, obtaining a restored thermodynamic diagram image by coding and decoding the mask self-encoder, and obtaining a two-dimensional coordinate of a key point from the restored thermodynamic diagram image.

Compared with the prior art, the technical scheme has the following technical effects: some thermodynamic diagram areas are deleted by means of input random shielding, training is carried out, and therefore the posture estimation module can adapt to the situation that when the thermodynamic diagrams of the two-dimensional coordinate points are not complete, corresponding adjustment can be carried out, the situation that the desktop state cannot be adjusted when the user images are shielded is prevented, and the accuracy of the posture estimation module in processing the thermodynamic diagrams of the two-dimensional coordinate points is improved.

In one example of the invention, the desktop state comprises: the height of the table top and the inclination angle of the table top.

Compared with the prior art, the technical scheme has the following technical effects: the comfort degree of the desktop to the user can be further improved by adjusting the height of the desktop and the inclination angle of the desktop.

In one embodiment of the invention, a motor control driving circuit and an electric push rod are arranged below the tabletop, and the motor control driving circuit and the electric push rod are connected with the control end; the control end controls the motor control drive circuit and the electric push rod to control the height of the table top and the inclination angle of the table top.

Compared with the prior art, the technical scheme has the following technical effects: the motor control drive circuit and the electric push rod are arranged below the tabletop, so that the tabletop height and the tabletop inclination angle can be controlled.

After the technical scheme of the invention is adopted, the following technical effects can be achieved:

(1) Acquiring a user image in real time through a front-end image acquisition device, adjusting the state of a desktop through the user image, and adjusting the height of the desktop to be in accordance with the scientific height of a user, specifically, acquiring a color image with a shielding head and a depth image with a hollow head according to the user image, inputting the color image with the shielding head and the depth image with the hollow head into a human face image fusion denoising module, denoising to obtain a color image with the shielding head and a depth image with the hollow head, sending the color image with the shielding head and the depth image with the hollow head to a detection module, obtaining a two-dimensional thermodynamic diagram through the detection module, sending the two-dimensional thermodynamic diagram to a posture estimation module, converting the two-dimensional thermodynamic diagram into two-dimensional coordinates of key points through the posture estimation module, combining the two-dimensional coordinates of the key points with the depth image with the hollow head to obtain the depth information of the two-dimensional coordinates of the key points, inputting the two-dimensional coordinates of the multi-layer sensor network to obtain the three-dimensional coordinates of the head and the key points, sending the two-dimensional coordinates of the head and the two-dimensional diagrams to a posture estimation module, sending the two-dimensional coordinates of the head to a control terminal, comparing the two-dimensional coordinates of the key points with the data of the control terminal with the pre-dimensional coordinates of the control terminal, and controlling the state of the desktop, thereby realizing the automatic conversion of the state of the desktop through the desktop;

(2) The method comprises the steps that a transformer is used for a human face image fusion denoising module main body, different linear mapping layers are respectively used for a color image with a shielded head and a depth image with a hollow cavity head to obtain a first color image feature vector and a first depth image feature vector, a second color image feature vector and a second depth image feature vector are separated through a local self-attention mechanism, a color image with the shielded head is obtained through combination of the first color image feature vector and the second color image feature vector, and the first depth image feature vector and the second depth image feature vector are combined to obtain a depth image with the hollow cavity head, so that the accuracy of the obtained color image with the shielded head and the depth image with the hollow cavity head is improved;

(3) The random mask shielding processing is carried out on the image through the training method of the face image fusion denoising module, so that the face image fusion denoising module can deal with user images under shielding of various different states, the situation that the desktop state cannot be adjusted when the user images are shielded is prevented, and the accuracy of the face image fusion denoising module for processing the user images is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts;

fig. 1 is a flowchart of a desktop lifting control method based on facial pose image estimation according to an embodiment of the present invention;

fig. 2 is a detailed flowchart of S200 in fig. 1.

Detailed Description

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments of the present invention are described in detail clearly and completely, and it is to be understood that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1, it is a flowchart of a desktop lifting control method based on facial pose image estimation according to an embodiment of the present invention. The desktop lifting control method based on the facial pose image estimation specifically comprises the following steps:

step S100: acquiring a user image in real time through front-end image acquisition equipment;

it should be noted that, here, the front-end image capturing apparatus may be a camera device disposed on a desktop.

Step S200: acquiring a color image of a head with a shielding part and a depth image of a head with a cavity according to a user image;

step S300: inputting the color image with the shielding head and the depth image with the cavity head into a human face image fusion denoising module, and denoising to obtain a color image with the shielding head and a depth image with the cavity head;

step S400: sending the image of the unshielded head color and the image of the removed cavity head depth to a detection module, and obtaining a two-dimensional thermodynamic diagram through the detection module;

step S500: sending the two-dimensional thermodynamic diagram to an attitude estimation module, and converting the two-dimensional thermodynamic diagram into two-dimensional coordinates of key points through the attitude estimation module;

step S600: combining the two-dimensional coordinates of the key points with the depopulation head depth image to obtain depth information of the two-dimensional coordinates of the key points, and inputting the depth information into a multi-layer perceptron network to obtain a head posture image and three-dimensional coordinates of the key points;

step S700: the head posture image and the three-dimensional coordinates of the key points are sent to a control end, and correction data are obtained by comparing the three-dimensional coordinates of the pre-stored key points arranged in the control end with data between desktop states;

step S800: and the control end controls the desktop state according to the correction data.

For example, with the development of the internet, more and more indoor office sedentary occupations appear, and often once sitting is a whole day, the height of a traditional office table is fixed, or the height of the table top can be adjusted through the lifting of table legs, but the use habit of general people can only be controlled to a proper height which is felt by the general people, but whether the height at the moment is a scientific height which is beneficial to the health of the general people cannot be determined, and whether the eye distance is within a reasonable range cannot be ensured. Therefore, the method includes the steps that a front-end image acquisition device acquires a user image in real time, the state of the desktop is adjusted through the user image, the height of the desktop is adjusted to be in accordance with the scientific height of a user, specifically, according to the user image, a color image with a shielding head and a depth image with a cavity head are acquired, the color image with the shielding head and the depth image with the cavity head are input to a human face image fusion denoising module, denoising is carried out to obtain a color image with the shielding head and a depth image with the cavity head, the color image with the shielding head and the depth image with the cavity head are sent to a detection module, a two-dimensional thermodynamic diagram is obtained through the detection module, the two-dimensional thermodynamic diagram is sent to a posture estimation module, the two-dimensional thermodynamic diagram is converted into two-dimensional coordinates of key points through the posture estimation module, then the two-dimensional coordinates of the key points and the depth image with the cavity head are combined to obtain depth information of the two-dimensional coordinates of the key points, a multi-layer perceptron network is input to obtain three-dimensional coordinates of the head posture image and the three-dimensional coordinates of the key points, finally, the three-dimensional coordinates of the head image and the three-dimensional coordinates of the key points are sent to a control terminal, and the posture of the desktop is automatically converted data of the desktop is corrected through the desktop, so that the desktop state of the desktop is corrected data is realized.

Further, referring to fig. 2, acquiring a color image of the shielded head and a depth image of the blank head according to the user image specifically includes:

step S210: the face image fusion denoising module main body uses a transformer to respectively block a color image with a shielded head and a depth image with a hole head and uses different linear mapping layers to obtain a first color image feature vector and a first depth image feature vector;

step S220: fusing image features of the color image with the shielding head and the depth image with the empty cavity head through a local self-attention mechanism, inputting position coding information of the color image with the shielding head and the depth image with the empty cavity head into an encoder module in combination, and separating out a second color image feature vector and a second depth image feature vector;

step S230: sending the second color image feature vector to a color image decoder module, and sending the second depth image feature vector to a depth image decoder module;

step S240: and combining the first color image feature vector and the second color image feature vector to obtain a head color image without shielding, and combining the first depth image feature vector and the second depth image feature vector to obtain a head depth image without holes.

For example, a transform is used for a face image fusion denoising module main body, a color image with a shielding head and a depth image with a cavity head are respectively blocked, different linear mapping layers are used for obtaining a first color image feature vector and a first depth image feature vector, a second color image feature vector and a second depth image feature vector are separated through a local self-attention mechanism, a color image with a shielding head is obtained through combination of the first color image feature vector and the second color image feature vector, and the first depth image feature vector and the second depth image feature vector are combined to obtain a depth image with a cavity head, so that the accuracy of the obtained color image with the shielding head and the depth image with the cavity head is improved.

Preferably, the desktop lifting control method based on the facial pose image estimation further includes: the training method of the face image fusion denoising module specifically comprises the following steps: and carrying out occlusion processing on the image through a random mask to generate a color image of the occluded head and a depth image of the empty cavity head, and carrying out the steps S210-S240 through the generated color image of the occluded head and the depth image of the empty cavity head, thereby realizing training.

For example, the random mask masking processing is performed on the image through a training method of the face image fusion denoising module, so that the face image fusion denoising module can deal with user images masked under various different states, the situation that the desktop state cannot be adjusted when the user images are masked is prevented, and the accuracy of the face image fusion denoising module in processing the user images is improved.

Specifically, the main body of the attitude estimation module is composed of an asymmetric encoder module and a decoder module, and a mask self-encoder framework is used; wherein, the encoder module and the decoder module both use swin transform, and the decoder module uses a full link layer.

For example, by setting the main body of the pose estimation module to be an asymmetric encoder module-decoder module, and both the encoder module and the decoder module use swin transform, the decoder module uses a full connection layer, thereby improving the accuracy of the pose estimation module analysis.

Further, the desktop lifting control method based on the facial pose image estimation further comprises the following steps: the attitude estimation module training method specifically comprises the following steps: inputting a two-dimensional coordinate point thermodynamic diagram, deleting some thermodynamic diagram areas by adopting a random shielding mode, inputting the reserved thermodynamic diagram into a mask self-encoder, coding and decoding by the mask self-encoder to obtain a restored thermodynamic diagram image, and obtaining a two-dimensional coordinate of a key point from the restored thermodynamic diagram image.

For example, some thermodynamic diagrams are deleted by inputting a random shielding mode and training is carried out, so that the posture estimation module can adapt to corresponding adjustment when the thermodynamic diagrams of the two-dimensional coordinate points are incomplete, the situation that the desktop state cannot be adjusted when the user images are shielded is prevented, and the accuracy of the posture estimation module in processing the thermodynamic diagrams of the two-dimensional coordinate points is improved.

Preferably, the desktop state comprises: the height of the table top and the inclination angle of the table top. For example, the comfort of the desktop for the user can be further improved by adjusting the height of the desktop and the inclination angle of the desktop.

Further, a motor control driving circuit and an electric push rod are arranged below the desktop, and the motor control driving circuit and the electric push rod are connected with a control end; wherein, the control end controls the motor to control the driving circuit and the electric push rod to control the height of the table top and the inclination angle of the table top.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, and not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A desktop lifting control method based on facial pose image estimation is characterized by comprising the following steps:

step S200: acquiring a color image of the head with the shielding part and a depth image of the head with the empty cavity according to the user image;

step S400: sending the image of the shielding-removed head part color and the image of the cavity-removed head part depth to a detection module, and obtaining a two-dimensional thermodynamic diagram through the detection module;

step S600: combining the two-dimensional coordinates of the key points with the depopulated head depth image to obtain depth information of the two-dimensional coordinates of the key points, and inputting the depth information into a multi-layer perceptron network to obtain a head posture image and three-dimensional coordinates of the key points;

step S700: sending the head posture image and the three-dimensional coordinates of the key points to a control end, and comparing the three-dimensional coordinates of the pre-stored key points built in the control end with data between desktop states to obtain correction data;

step S800: and the control terminal controls the desktop state according to the correction data.

2. The method for controlling the elevation of a desktop based on facial pose image estimation according to claim 1, wherein the step S200 specifically comprises:

step S210: the human face image fusion denoising module main body uses a transformer to respectively block the color image with the shielding head part and the depth image with the empty cavity head part and uses different linear mapping layers to obtain a first color image feature vector and a first depth image feature vector;

step S220: fusing image characteristics of the color image with the shielding head and the depth image with the cavity head through a local self-attention mechanism, inputting the fused image characteristics into an encoder module by combining position coding information of the color image with the shielding head and the depth image with the cavity head, and separating a second color image characteristic vector and a second depth image characteristic vector;

step S240: and combining the first color image feature vector and the second color image feature vector to obtain the de-occluded head color image, and combining the first depth image feature vector and the second depth image feature vector to obtain the de-voided head depth image.

3. The method of claim 2, further comprising: the training method of the face image fusion denoising module specifically comprises the following steps:

and carrying out shielding processing on the image through a random mask to generate the color image of the head with shielding and the depth image of the head with the cavity, and carrying out the steps S210-S240 through the generated color image of the head with shielding and the depth image of the head with the cavity, thereby realizing training.

4. The method of claim 1, wherein the method of controlling the elevation of a desktop based on the estimation of the facial pose image,

the main body of the attitude estimation module is composed of an asymmetric encoder module and a decoder module, and a mask self-encoder framework is used;

wherein the encoder module and the decoder module both use swin transformers and the decoder module uses fully-connected layers.

5. The method of claim 4, further comprising: the attitude estimation module training method specifically comprises the following steps:

inputting a two-dimensional coordinate point thermodynamic diagram, deleting some thermodynamic diagram areas by adopting a random shielding mode, inputting the reserved thermodynamic diagram into the mask self-encoder, obtaining a restored thermodynamic diagram image by coding and decoding the mask self-encoder, and obtaining a two-dimensional coordinate of a key point from the restored thermodynamic diagram image.

6. The method of claim 1, wherein the method of controlling the elevation of a desktop based on the estimation of the facial pose image,

the desktop state includes: the height of the table top and the inclination angle of the table top.

7. The method of claim 6, wherein the image of the face pose is estimated based on the image of the face pose,

a motor control driving circuit and an electric push rod are arranged below the tabletop, and the motor control driving circuit and the electric push rod are connected with the control end;

the control end controls the motor to control the driving circuit and the electric push rod to control the height and the inclination angle of the table top.