CN115984384B

CN115984384B - Desktop lifting control method based on facial pose image estimation

Info

Publication number: CN115984384B
Application number: CN202310265134.5A
Authority: CN
Inventors: 项乐宏; 夏银水; 李裕麒; 王翀; 蓝艇
Original assignee: Loctek Ergonomic Technology Co Ltd
Current assignee: Loctek Ergonomic Technology Co Ltd
Priority date: 2023-03-20
Filing date: 2023-03-20
Publication date: 2023-07-21
Anticipated expiration: 2043-03-20
Also published as: CN115984384A

Abstract

The invention provides a desktop lifting control method based on facial attitude image estimation, which comprises the following steps: acquiring a color image with a blocked head and a depth image of a head with a cavity according to a user image; inputting the color image with the shielding head and the depth image with the cavity head into a face image fusion denoising module, and denoising to obtain the color image with the shielding head and the depth image with the cavity head; the detection module obtains a two-dimensional thermodynamic diagram; converting the two-dimensional thermodynamic diagram into two-dimensional coordinates of key points through a gesture estimation module; combining the two-dimensional coordinates of the key points with the depth image of the cavity-removed head to obtain depth information of the two-dimensional coordinates of the key points, and inputting the depth information into a multi-layer perceptron network to obtain a head gesture image and three-dimensional coordinates of the key points; the data between the three-dimensional coordinates of the pre-stored key points and the desktop state in the control end are compared to obtain correction data; and the control end controls the desktop state according to the correction data. The embodiment of the invention enables the intelligent control of the desktop state.

Description

Desktop lifting control method based on facial pose image estimation

Technical Field

The invention relates to the technical field of image data processing, in particular to a desktop lifting control method based on facial attitude image estimation.

Background

Since ancient times, the labor work of people is a sport, and people can exercise physique. However, with the rapid development of modern information technology, internet technology has been popular in the world and various fields, more and more people begin to sit for a day, a traditional working platform is actually a desk, the height of the desk is generally not adjustable, and many desks are standard for purchasing, and the desks are poor in universality due to unified height, so that specific requirements of users with different heights cannot be met. If the height of the desk is too high or too low, the user will feel more or less uncomfortable, such as soreness of the waist and back, arm pain, etc., and the health of the user will be adversely affected after long-term use. In most cases, however, the user has to endure a desk that does not match his own height for a long period of time, and can make up to that is, adjust the height of the chair; if the height of the chair is not adjustable, only one cushion can be added to the chair. This approach may improve to some extent, but has limited improvement. In recent years, office tables are developed from fixed office tables which can be lifted manually due to the improvement of living standard and the improvement of living quality requirements of people. Current manually controlled lifting office tables are typically equipped with a control panel having up/down keys. The electric push rod can drive the desktop to ascend or descend within a certain range so as to better meet the requirements of users only by pressing the up/down key. The desk is simple in principle and strong in applicability, and can meet the requirements of different crowds on the use height of the desk, and all things become simple as the desk can be lifted. However, such a manually controlled lifting desk can only be controlled to a proper height for the habit of the average person, but it cannot be determined whether the height is a scientific height which accords with the health of the average person, and whether the distance between eyes is within a reasonable range cannot be ensured, so improvement is needed.

According to the research of ergonomics, the desk top height and the eye distance required by different behaviors of a user to ensure good sitting posture are different, and the problem is perfectly solved based on the task self-adaptive intelligent lifting of the face posture estimation. According to the system, the neural network model is built, the human face posture estimation and the behavior recognition algorithm are used, the behaviors of the user are shot in real time through the camera, the user is reminded of whether to lift or adjust the inclination of the desktop in real time when the head posture of the user is changed, and the desktop is lifted to a favorable height which accords with the health of the user according to ergonomics, so that the system is convenient to use, more intelligent and beneficial to the health and comfort of the user. Meanwhile, as a user may wear a mask and other shielding objects in work, most of faces and key points are shielded, and accurate detection cannot be completed.

Disclosure of Invention

Therefore, the embodiment of the invention provides a desktop lifting control method based on facial attitude image estimation, which realizes intelligent control of desktop states.

In order to solve the above problems, the present invention provides a desktop lifting control method based on facial pose image estimation, including: step S100: acquiring a user image in real time through front-end image acquisition equipment; step S200: acquiring a color image of the head with the shielding and a depth image of the head with the cavity according to the user image; step S300: inputting the color image with the shielding head and the depth image with the cavity head into a face image fusion denoising module, and denoising to obtain a color image with the shielding head and a depth image with the cavity head; step S400: transmitting the de-occlusion head color image and the de-cavity head depth image to a detection module, and obtaining a two-dimensional thermodynamic diagram through the detection module; step S500: transmitting the two-dimensional thermodynamic diagram to a gesture estimation module, and converting the two-dimensional thermodynamic diagram into two-dimensional coordinates of key points through the gesture estimation module; step S600: combining the two-dimensional coordinates of the key points with the depth image of the cavity-removed head to obtain depth information of the two-dimensional coordinates of the key points, and inputting the depth information into a multi-layer perceptron network to obtain a head gesture image and three-dimensional coordinates of the key points; step S700: the head gesture image and the three-dimensional coordinates of the key points are sent to a control end, and correction data are obtained through data comparison between the three-dimensional coordinates of the pre-stored key points and the desktop state in the control end; step S800: and the control end controls the desktop state according to the correction data.

Compared with the prior art, the technical effect achieved by adopting the technical scheme is as follows: as more and more indoor office sedentary occupation appears along with the development of the internet, and the sitting is often a whole day, the traditional office desk is of a fixed height, or the height of the desk top can be adjusted through the lifting of desk legs, but the use habit of a general person can only be controlled to a proper self-sense height, but whether the height at the moment is a scientific height which accords with the health benefits of the user cannot be determined, and whether the eye distance is in a reasonable range cannot be ensured. Therefore, the user image is acquired in real time through the front-end image acquisition equipment, the desktop state is adjusted through the user image, the desktop is adjusted to be in line with the scientific height of the user, the color image with the shielding head and the depth image with the cavity head are acquired specifically according to the user image, the color image with the shielding head and the depth image with the cavity head are input into the face image fusion denoising module, denoising is conducted to obtain the color image with the shielding head and the depth image with the cavity head, then the color image with the shielding head and the depth image with the cavity head are sent to the detection module, the two-dimensional thermodynamic diagram is obtained through the detection module, then the two-dimensional thermodynamic diagram is sent to the gesture estimation module, the two-dimensional thermodynamic diagram is converted into the two-dimensional coordinates of the key point through the gesture estimation module, the two-dimensional coordinates of the key point are combined with the depth image with the cavity head, the depth image with the shielding head and the cavity head is input into the multi-layer perceptron network, the three-dimensional coordinates of the head gesture image and the three-dimensional coordinates of the key point are finally sent to the control end, data between the three-dimensional coordinates of the key point and the desktop are stored in the control end are compared, the two-dimensional coordinates of the key point and the desktop state are automatically stored, the two-dimensional thermodynamic diagram is obtained, the gesture state is converted, the state of the desktop is converted, and the desktop state is controlled, and the state is achieved.

In one embodiment of the present invention, the step S200 specifically includes: step S210: the main body of the face image fusion denoising module uses a transducer to block the color image with the shielding head and the depth image with the cavity head respectively and uses different linear mapping layers to obtain a first color image feature vector and a first depth image feature vector; step S220: fusing image features of the color image with the shielding head and the depth image with the cavity head through a local self-attention mechanism, and inputting position coding information of the color image with the shielding head and the depth image with the cavity head into an encoder module by combining the position coding information of the color image with the shielding head and the depth image with the cavity head, so as to separate a second color image feature vector and a second depth image feature vector; step S230: sending the second color image feature vector to a color image decoder module, and sending the second depth image feature vector to a depth image decoder module; step S240: and combining the first color image feature vector and the second color image feature vector to obtain the de-occlusion head color image, and combining the first depth image feature vector and the second depth image feature vector to obtain the de-cavity head depth image.

Compared with the prior art, the technical effect achieved by adopting the technical scheme is as follows: the face image fusion denoising module main body uses a transducer, and respectively blocks the color image with the shielding head and the depth image with the cavity, and uses different linear mapping layers to obtain a first color image feature vector and a first depth image feature vector, the second color image feature vector and the second depth image feature vector are separated through a local self-attention mechanism, the color image with the shielding head is obtained through combining the first color image feature vector and the second color image feature vector, the depth image with the shielding head is obtained through combining the first depth image feature vector and the second depth image feature vector, and therefore accuracy of the obtained color image with the shielding head and the depth image with the cavity is improved.

In one example of the present invention, further comprising: the training method of the face image fusion denoising module specifically comprises the following steps: and (2) carrying out shielding processing on the image through a random mask, generating the color image with the shielding head and the depth image with the cavity head, and carrying out steps S210-S240 through the generated color image with the shielding head and the depth image with the cavity head, so that training is realized.

Compared with the prior art, the technical effect achieved by adopting the technical scheme is as follows: the training method of the face image fusion denoising module is used for carrying out random mask shielding processing on the images, so that the face image fusion denoising module can cope with user images under various different state shielding, the condition that the desktop state cannot be adjusted when the user images are shielded is prevented, and the accuracy of the face image fusion denoising module for processing the user images is improved.

In one example of the invention, the body of the pose estimation module is an asymmetric encoder module-decoder module composition and uses a mask from encoder architecture; wherein both the encoder module and the decoder module use a swin transformer and the decoder module uses a fully connected layer.

Compared with the prior art, the technical effect achieved by adopting the technical scheme is as follows: by setting the body of the attitude estimation module to be an asymmetric encoder module-decoder module composition, and using a swin transformer for both the encoder module and the decoder module, the decoder module uses a full connection layer, thereby improving the accuracy of the analysis of the attitude estimation module.

In one example of the present invention, further comprising: the gesture estimation module training method specifically comprises the following steps: and inputting a two-dimensional coordinate point thermodynamic diagram, deleting some thermodynamic diagram areas in a random shielding mode, inputting the reserved thermodynamic diagram into the mask self-encoder, encoding and decoding the restored thermodynamic diagram image through the mask self-encoder, and obtaining the two-dimensional coordinate of the key point from the restored thermodynamic diagram image.

Compared with the prior art, the technical effect achieved by adopting the technical scheme is as follows: some thermodynamic diagram areas are deleted and trained by inputting a random shielding mode, so that the gesture estimation module can adapt to the situation that when the thermodynamic diagram of the acquired two-dimensional coordinate points is incomplete, corresponding adjustment can be carried out, the condition that the desktop state cannot be adjusted when a user image is shielded is prevented, and the accuracy of the gesture estimation module for processing the thermodynamic diagram of the two-dimensional coordinate points is improved.

In one example of the present invention, the desktop state includes: mesa height and mesa tilt angle.

Compared with the prior art, the technical effect achieved by adopting the technical scheme is as follows: the comfort level of the table top to the user can be further improved by adjusting the height of the table top and the inclination angle of the table top.

In one example of the invention, a motor control driving circuit and an electric push rod are arranged below the desktop, and the motor control driving circuit and the electric push rod are connected with the control end; wherein, the motor control driving circuit and the electric push rod are controlled by the control end to control the table top height and the table top inclination angle.

Compared with the prior art, the technical effect achieved by adopting the technical scheme is as follows: the motor control driving circuit and the electric push rod are arranged below the tabletop, so that the tabletop height and the tabletop inclination angle are controlled.

After the technical scheme of the invention is adopted, the following technical effects can be achieved:

(1) The method comprises the steps of acquiring a user image in real time through front-end image acquisition equipment, adjusting the desktop state through the user image, adjusting the desktop height to be in line with the scientific height of a user, acquiring a color image with a blocked head and a depth image with a hollow head according to the user image, inputting the color image with the blocked head and the depth image with the hollow head into a face image fusion denoising module, denoising to obtain the color image with the blocked head and the depth image with the hollow head, transmitting the color image with the blocked head and the depth image with the hollow head to a detection module, obtaining a two-dimensional thermodynamic diagram through the detection module, transmitting the two-dimensional thermodynamic diagram to a gesture estimation module, converting the two-dimensional thermodynamic diagram into two-dimensional coordinates of key points through the gesture estimation module, combining the two-dimensional coordinates of the key points with the depth image with the hollow head to obtain depth information of the two-dimensional coordinates of the key points, inputting the three-dimensional coordinates of the key points into a multi-layer perceptron network, finally transmitting the three-dimensional coordinates of the head gesture image and the three-dimensional coordinates of the key points to a control end, comparing data between the three-dimensional coordinates of the key points and the key points built in the control end to obtain corrected data, and automatically controlling the desktop state through the corrected data, so that the desktop state is converted through the desktop state is realized by the user;

(2) The face image fusion denoising module main body uses a transducer, and different linear mapping layers are respectively used for a color image with an occlusion head and a depth image with a cavity to obtain a first color image feature vector and a first depth image feature vector, a second color image feature vector and a second depth image feature vector are separated through a local self-attention mechanism, the color image with the occlusion head is obtained through combining the first color image feature vector and the second color image feature vector, and the depth image with the occlusion head and the depth image with the cavity are obtained through combining the first depth image feature vector and the second depth image feature vector, so that the accuracy of the obtained color image with the occlusion head and the depth image with the cavity is improved;

(3) The training method of the face image fusion denoising module is used for carrying out random mask shielding processing on the images, so that the face image fusion denoising module can cope with user images under various different state shielding, the condition that the desktop state cannot be adjusted when the user images are shielded is prevented, and the accuracy of the face image fusion denoising module for processing the user images is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings to be used in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art;

fig. 1 is a flowchart of a desktop lift control method based on facial pose image estimation according to an embodiment of the present invention;

fig. 2 is a specific flowchart of S200 in fig. 1.

Detailed Description

In order to make the above objects, features and advantages of the present invention more comprehensible, embodiments accompanied with present invention are described in detail with embodiments of the present invention including only some but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Referring to fig. 1, a flowchart of a desktop lift control method based on facial pose image estimation according to an embodiment of the present invention is shown. The desktop lifting control method based on the facial attitude image estimation specifically comprises the following steps:

step S100: acquiring a user image in real time through front-end image acquisition equipment;

it should be noted that, here, the front-end image capturing apparatus may be an image capturing device disposed on a desktop.

Step S200: acquiring a color image with a blocked head and a depth image of a head with a cavity according to a user image;

step S300: inputting the color image with the shielding head and the depth image with the cavity head into a face image fusion denoising module, and denoising to obtain the color image with the shielding head and the depth image with the cavity head;

step S400: transmitting the color image of the de-occlusion head and the depth image of the de-cavity head to a detection module, and obtaining a two-dimensional thermodynamic diagram through the detection module;

step S500: the two-dimensional thermodynamic diagram is sent to a gesture estimation module, and the two-dimensional thermodynamic diagram is converted into two-dimensional coordinates of key points through the gesture estimation module;

step S600: combining the two-dimensional coordinates of the key points with the depth image of the cavity-removed head to obtain depth information of the two-dimensional coordinates of the key points, and inputting the depth information into a multi-layer perceptron network to obtain a head gesture image and three-dimensional coordinates of the key points;

step S700: transmitting the head gesture image and the three-dimensional coordinates of the key points to a control end, and comparing data between the three-dimensional coordinates of the pre-stored key points and the desktop state in the control end to obtain correction data;

step S800: and the control end controls the desktop state according to the correction data.

For example, with the development of the internet, more and more occupation of indoor office for long sitting is presented, and the conventional office desk is usually at a fixed height for a whole day, or the height of the desk top can be adjusted by lifting the desk legs, but the use habit of a general person can only be controlled to a proper height for self-perception, but it cannot be determined whether the height at this time is a scientific height which accords with the health of the user, and whether the distance between eyes is within a reasonable range cannot be ensured. Therefore, the user image is acquired in real time through the front-end image acquisition equipment, the desktop state is adjusted through the user image, the desktop is adjusted to be in line with the scientific height of the user, the color image with the shielding head and the depth image with the cavity head are acquired specifically according to the user image, the color image with the shielding head and the depth image with the cavity head are input into the face image fusion denoising module, denoising is conducted to obtain the color image with the shielding head and the depth image with the cavity head, then the color image with the shielding head and the depth image with the cavity head are sent to the detection module, the two-dimensional thermodynamic diagram is obtained through the detection module, then the two-dimensional thermodynamic diagram is sent to the gesture estimation module, the two-dimensional thermodynamic diagram is converted into the two-dimensional coordinates of the key point through the gesture estimation module, the two-dimensional coordinates of the key point are combined with the depth image with the cavity head, the depth image with the shielding head and the cavity head is input into the multi-layer perceptron network, the three-dimensional coordinates of the head gesture image and the three-dimensional coordinates of the key point are finally sent to the control end, data between the three-dimensional coordinates of the key point and the desktop are stored in the control end are compared, the two-dimensional coordinates of the key point and the desktop state are automatically stored, the two-dimensional thermodynamic diagram is obtained, the gesture state is converted, the state of the desktop is converted, and the desktop state is controlled, and the state is achieved.

Further, referring to fig. 2, acquiring a color image of a head with shielding and a depth image of a head with a cavity according to a user image specifically includes:

step S210: the main body of the face image fusion denoising module uses a transducer to block a color image with a shielding head and a depth image with a cavity respectively and uses different linear mapping layers to obtain a first color image feature vector and a first depth image feature vector;

step S220: fusing image features of the color image with the shielding head and the depth image with the cavity head through a local self-attention mechanism, inputting the image features into an encoder module by combining position coding information of the color image with the shielding head and the depth image with the cavity head, and separating a second color image feature vector and a second depth image feature vector;

step S230: sending the second color image feature vector to the color image decoder module and sending the second depth image feature vector to the depth image decoder module;

step S240: combining the first color image feature vector and the second color image feature vector to obtain a de-occlusion head color image, and combining the first depth image feature vector and the second depth image feature vector to obtain a cavity-free head depth image.

For example, a face image fusion denoising module main body uses a transducer, a color image with an occlusion head and a depth image with a cavity head are respectively segmented, different linear mapping layers are used for obtaining a first color image feature vector and a first depth image feature vector, a second color image feature vector and a second depth image feature vector are separated through a local self-attention mechanism, the color image with the occlusion head is obtained through combining the first color image feature vector and the second color image feature vector, the depth image with the occlusion head is obtained through combining the first depth image feature vector and the second depth image feature vector, and therefore accuracy of the obtained color image with the occlusion head and the depth image with the cavity head is improved.

Preferably, the desktop lift control method based on the facial pose image estimation further includes: the training method of the face image fusion denoising module specifically comprises the following steps: and (2) carrying out shielding processing on the image through a random mask to generate a color image with a shielding head and a depth image with a cavity, and carrying out steps S210-S240 through the generated color image with the shielding head and the depth image with the cavity, thereby realizing training.

For example, the training method of the face image fusion denoising module is used for carrying out random mask shielding processing on the image, so that the face image fusion denoising module can cope with user images under various different state shielding, the condition that the desktop state cannot be adjusted when the user images are shielded is prevented, and the accuracy of the face image fusion denoising module for processing the user images is improved.

Specifically, the body of the attitude estimation module is composed of an asymmetric encoder module and a decoder module, and a mask is used for self-encoding architecture; wherein both encoder and decoder modules use a swin transformer and the decoder module uses a fully connected layer.

For example, by setting the body of the pose estimation module to be an asymmetric encoder module-decoder module composition, and both the encoder module and the decoder module use a swin transformer, the decoder module uses a fully connected layer, thereby improving the accuracy of the pose estimation module analysis.

Further, the desktop lifting control method based on the facial pose image estimation further comprises the following steps: the gesture estimation module training method specifically comprises the following steps: and inputting a two-dimensional coordinate point thermodynamic diagram, deleting some thermodynamic diagram areas in a random shielding mode, inputting the reserved thermodynamic diagram into a mask from an encoder, encoding and decoding the restored thermodynamic diagram image through the mask from the encoder, and obtaining the two-dimensional coordinate of the key point from the restored thermodynamic diagram image.

For example, some thermodynamic diagram areas are deleted by inputting a random shielding mode, and training is performed, so that the gesture estimation module can adapt to corresponding adjustment when the thermodynamic diagram of the two-dimensional coordinate point is incomplete, thereby preventing the condition that the desktop state cannot be adjusted when the user image is shielded, and improving the accuracy of the gesture estimation module for processing the thermodynamic diagram of the two-dimensional coordinate point.

Preferably, the desktop state includes: mesa height and mesa tilt angle. For example, the comfort of the table top to the user may be further enhanced by adjusting the table top height and the table top tilt angle.

Further, a motor control driving circuit and an electric push rod are arranged below the tabletop, and the motor control driving circuit and the electric push rod are connected with a control end; wherein, control motor control drive circuit and electric putter through the control end to mesa height and mesa inclination control.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A tabletop lifting control method based on facial pose image estimation, characterized by comprising the following steps:

step S200: acquiring a color image of the head with the shielding and a depth image of the head with the cavity according to the user image;

step S300: inputting the color image with the shielding head and the depth image with the cavity head into a face image fusion denoising module, and denoising to obtain a color image with the shielding head and a depth image with the cavity head; the main body of the face image fusion denoising module uses a transducer to block the color image with the shielding head and the depth image with the cavity head respectively and uses different linear mapping layers to obtain a first color image feature vector and a first depth image feature vector; fusing image features of the color image with the shielding head and the depth image with the cavity head through a local self-attention mechanism, and inputting position coding information of the color image with the shielding head and the depth image with the cavity head into an encoder module by combining the position coding information of the color image with the shielding head and the depth image with the cavity head, so as to separate a second color image feature vector and a second depth image feature vector; sending the second color image feature vector to a color image decoder module, and sending the second depth image feature vector to a depth image decoder module; combining the first color image feature vector and the second color image feature vector to obtain the de-occlusion head color image, and combining the first depth image feature vector and the second depth image feature vector to obtain the de-cavity head depth image;

step S400: transmitting the de-occlusion head color image and the de-cavity head depth image to a detection module, and obtaining a two-dimensional thermodynamic diagram through the detection module;

step S500: transmitting the two-dimensional thermodynamic diagram to a gesture estimation module, and converting the two-dimensional thermodynamic diagram into two-dimensional coordinates of key points through the gesture estimation module;

step S700: the head gesture image and the three-dimensional coordinates of the key points are sent to a control end, and correction data are obtained through data comparison between the three-dimensional coordinates of the pre-stored key points and the desktop state in the control end;

2. The tabletop lift control method based on facial pose image estimation according to claim 1, further comprising: the training method of the face image fusion denoising module specifically comprises the following steps:

and (2) carrying out shielding processing on the image through a random mask, generating the color image with the shielding head and the depth image with the cavity head, and carrying out steps S210-S240 through the generated color image with the shielding head and the depth image with the cavity head, so that training is realized.

3. The method for controlling lifting of a tabletop based on facial pose image estimation according to claim 1, wherein,

the body of the attitude estimation module is composed of an asymmetric encoder module and a decoder module, and a mask is used for self-encoding architecture;

wherein both the encoder module and the decoder module use a swin transformer and the decoder module uses a fully connected layer.

4. The tabletop lift control method based on facial pose image estimation according to claim 3, further comprising: the gesture estimation module training method specifically comprises the following steps:

and inputting a two-dimensional coordinate point thermodynamic diagram, deleting some thermodynamic diagram areas in a random shielding mode, inputting the reserved thermodynamic diagram into the mask self-encoder, encoding and decoding the restored thermodynamic diagram image through the mask self-encoder, and obtaining the two-dimensional coordinate of the key point from the restored thermodynamic diagram image.

5. The method for controlling lifting of a tabletop based on facial pose image estimation according to claim 1, wherein,

the desktop state includes: mesa height and mesa tilt angle.

6. The method for desktop lift control based on facial pose image estimation according to claim 5, wherein,

a motor control driving circuit and an electric push rod are arranged below the desktop, and the motor control driving circuit and the electric push rod are connected with the control end;

wherein, the motor control driving circuit and the electric push rod are controlled by the control end to control the table top height and the table top inclination angle.