CN111881888A

CN111881888A - Intelligent table control method and device based on attitude identification

Info

Publication number: CN111881888A
Application number: CN202010853594.6A
Authority: CN
Inventors: 董秀园
Original assignee: Individual
Current assignee: Individual
Priority date: 2020-08-21
Filing date: 2020-08-21
Publication date: 2020-11-03

Abstract

The disclosure relates to a method and a device for controlling an intelligent table based on gesture recognition. The method comprises the following steps: acquiring image data of a target user through an image data acquisition device; identifying the target user from the image data through a gesture identification algorithm, and outputting a human body posture graph required by the intelligent table; reconstructing human body three-dimensional posture information of the target user through a three-dimensional reconstruction algorithm based on the human body posture diagram; carrying out bone registration on the target user and a reference person by utilizing a posture key node in a three-dimensional space; comparing the human three-dimensional pose information of the target user and the reference person after bone registration; and adaptively adjusting the intelligent table based on the comparison result.

Description

Intelligent table control method and device based on attitude identification

Technical Field

The present disclosure relates generally to the field of artificial intelligence, and in particular, to a method and apparatus for controlling an intelligent table based on gesture recognition.

Background

With the development of society, more and more people begin to engage in mental labor. Whether sitting in a classroom for learning or sitting in an office for work, it is often necessary to sit for several hours from morning to evening. The height or tilt angle of the tables and chairs purchased by a company or school is generally customized uniformly and does not necessarily suit the stature of each person. The posture of a person and even the health of the person can be influenced by using the desk and chair with the height or the inclination angle which is not suitable for the person. For example, for the office population, a long and incorrect posture is likely to cause cervical spondylosis and even cause a series of complications. In addition, for teenagers, as the teenagers are in the key period of body development, and a plurality of bones or organs of the body are not developed and matured, the wrong postures can cause bone deformation, so that the problems of humpback, myopia and the like are caused, and the growth and development of the body are further influenced. Therefore, the height or the inclination angle of the table needs to be adjusted adaptively according to the situation of each person, so as to meet the requirements of different people on the height or the inclination angle of the table and chair. Although the conventional adjustable table and chair can manually adjust the height or the inclination angle of the table and chair, users often have no scientific education and it is not clear how to adjust the height or the inclination angle of the table according to the height, the length of legs, the length of arms and the shape of body. Even though the user has received scientific education, the table and chair cannot be adjusted accurately by visual observation.

In order to meet the demand of people for beautiful body posture and body health, systems and devices related to intelligent tables and chairs are gradually appeared. However, the prior art mostly provides an approximate range for users to adjust the height or the inclination angle of the table and chair according to the age, the sex, the height or the weight of the users, but the prior arts cannot accurately display the posture of the people in the three-dimensional space, and cannot individually adjust the height or the inclination angle of the table and chair of a plurality of users in a large-scale space. In addition, the traditional intelligent desk and chair also has the functions of reminding when the posture of the user is incorrect and giving appropriate optimization suggestions.

Disclosure of Invention

In view of the above technical problems, the present disclosure provides a method and an apparatus for controlling an intelligent table based on gesture recognition. The method and the device acquire the video and the image of the target user through multiple visual angles, and reconstruct three-dimensional posture information, so that the posture of the target user is acquired in an all-around manner. In addition, the intelligent table control method and device further utilize a deep learning neural network to combine with a traditional machine learning technology to perform attitude analysis, then the intelligent table is subjected to adaptive adjustment through comparison with a reference person, and optimization suggestions are fed back to a target user.

In one aspect of the present disclosure, a method for controlling a smart table based on gesture recognition is provided, which includes the following steps: acquiring image data of a target user through an image data acquisition device; identifying the target user from the image data through a gesture identification algorithm, and outputting a human body posture graph required by the intelligent table; reconstructing human body three-dimensional posture information of the target user through a three-dimensional reconstruction algorithm based on the human body posture diagram; carrying out bone registration on the target user and a reference person by utilizing a posture key node in a three-dimensional space; comparing the human three-dimensional pose information of the target user and the reference person after bone registration; and adaptively adjusting the intelligent table based on the comparison result.

In a preferred embodiment, the image data acquisition device comprises at least one of: planar camera, degree of depth camera, infrared camera or thermal imaging system, wherein the degree of depth camera includes following at least one: a time-of-flight camera, a structured light camera, or a binocular camera.

In another preferred embodiment, the outputting the body state diagram required by the smart table further includes: determining the posture of the human body through key nodes of the human body, wherein the key nodes comprise at least one of the following: limb joint points and facial key points, and the position information of the key nodes is represented by coordinates; determining location coordinates of at least one of the key nodes in the image data; determining category information of at least one of the key nodes, wherein the category information includes: body feature information of interest, the body feature information of interest comprising: key characteristic points of human body parts required by human body monitoring tasks and human body biomechanical model analysis aiming at different applications; determining state information of at least one of the key nodes, wherein the state information comprises: visible, invisible, and either speculative or non-speculative; and linking the key nodes into the human body posture graph through the position relation and the reliability among the key nodes.

In a further preferred embodiment, the gesture recognition algorithm comprises a deep learning neural network prediction algorithm, wherein the deep learning neural network requires training, the training comprising: preparing a human body posture image set, wherein human body posture image data in the human body posture image set is marked according to the key nodes; and training a deep learning model by using the human body posture image set, and updating parameters of the deep learning neural network through error back propagation until convergence to obtain the deep learning neural network which is completely trained.

In another preferred embodiment, the reconstructing the human three-dimensional pose information of the target user by the three-dimensional reconstruction algorithm based on the human body state diagram further includes: acquiring shooting parameters of the image data acquisition device, and establishing a three-dimensional space coordinate system according to the shooting parameters, wherein the shooting parameters comprise at least one of the following parameters: the orientation, angle and viewing angle of the camera, and the focal length.

In another preferred embodiment, the reconstructing the human three-dimensional pose information of the target user by the three-dimensional reconstruction algorithm based on the human body state diagram further includes: in the case of a single depth camera, reconstructing the human three-dimensional pose information of the target user by converting the human body pose image generated from the depth image acquired by the depth camera into a three-dimensional point cloud image; in the case of a combination of a plane camera and a depth camera, processing the human body posture image generated by a plane image acquired by the plane camera and a three-dimensional point cloud image converted from a depth image acquired by the depth camera to reconstruct the human body three-dimensional posture information of the target user; or in the case of a multi-view image data acquisition device combination, the human body three-dimensional posture information of the target user is reconstructed by projecting the human body posture image generated by the image data acquisition device at each view angle into the three-dimensional space coordinate system.

In another preferred embodiment, the bone registration of the target user with the reference using the body state key nodes in the three-dimensional space further includes global bone scaling and local bone scaling, wherein the global bone scaling refers to registration of coordinate sets of key nodes for the whole human body, and the local bone scaling refers to registration of coordinates of local key nodes in the key nodes for the human body, including: calculating a bone length of the target user and the reference, wherein the bone length is a distance between the location coordinates of the key nodes linked together, wherein the distance comprises at least one of: euclidean distance, standardized Euclidean distance, Mahalanobis distance, cosine distance; performing bone registration on the bone length of the target user according to the corresponding bone length of the reference; or performing bone registration on the bone length of the reference person according to the corresponding bone length of the target user.

In yet another preferred embodiment, comparing the human three-dimensional pose information of the skeletal registered target user and the reference further comprises at least one of: comparing the distances of the key nodes of the target user and the corresponding key nodes of the reference person on the three-dimensional space one by one through calculation, wherein the larger the distance is, the larger the gesture difference is; calculating distances between a plurality of key nodes of the target user and a plurality of corresponding key nodes of the reference person on the three-dimensional space and averaging the distances for comparison, wherein the larger the average value is, the larger the gesture gap is; and comparing the included angle between the line segment formed by the corresponding interlinkage key nodes of the target user and the line segment formed by the corresponding interlinkage key nodes of the reference person by calculating, wherein the larger the included angle is, the larger the gesture difference is.

In a further preferred embodiment, adapting the smart table based on the comparison further comprises: adaptively adjusting the height or the inclination angle of the intelligent table based on the grade obtained by weighting the comparison result obtained by the one or more comparison modes; forming a memory of the height or tilt angle of the smart table for the target user so as to directly adjust the smart table to the height or tilt angle memorized for the target user in response to identifying the target user; and feeding back a pose optimization suggestion to the target user.

In another aspect of the present disclosure, there is provided a smart table control apparatus based on gesture recognition, including: an image acquisition module configured to acquire image data of a target user through an image data acquisition device; a gesture recognition module configured to recognize the target user from the image data through a gesture recognition algorithm and output a human body posture image required by the smart table; a three-dimensional reconstruction module configured to reconstruct human three-dimensional pose information of the target user through a three-dimensional reconstruction algorithm based on the human body posture map; a bone registration module configured to bone register the target user with a reference using a posture key node in three-dimensional space; a pose comparison module configured to compare the skeletal registered human three-dimensional pose information of the target user and the reference; and an adaptive adjustment module configured to adaptively adjust the smart table based on the comparison result.

In addition, the intelligent table control device based on gesture recognition can realize the intelligent table control method based on gesture recognition.

Compared with the prior art, the beneficial effects of the disclosure are: according to the method and the device, three-dimensional posture information is reconstructed, a deep learning neural network is combined with a traditional machine learning technology to perform posture analysis, the intelligent table is adaptively adjusted, and optimization suggestions are fed back to a target user, so that the development of the intelligent table and chair in the aspect of health is promoted, and the requirements of people on beautiful body posture and body health are met.

Drawings

The novel features believed characteristic of the invention are set forth with particularity in the appended claims. A better understanding of the features and advantages of the present invention will be obtained by reference to the following drawings and detailed description that set forth illustrative embodiments, in which the principles of the invention are utilized. The drawings are only for purposes of illustrating embodiments and are not to be construed as limiting the invention. Also, in the drawings, wherein like reference numerals refer to like elements throughout:

FIG. 1 illustrates a flow chart of a method for smart table control based on gesture recognition according to an exemplary embodiment of the present disclosure;

FIG. 2 illustrates one arrangement of cameras according to an exemplary embodiment of the present disclosure;

FIG. 3 illustrates a model application flow diagram in accordance with an exemplary embodiment of the present disclosure;

FIG. 4 illustrates a training flow diagram of a deep learning network model according to an exemplary embodiment of the present disclosure;

FIG. 5 illustrates an example of a body key feature point marker template according to an exemplary embodiment of the present disclosure;

FIGS. 6A and 6B show schematic diagrams of body key feature points for standing and sitting posture analysis, respectively, according to an exemplary embodiment of the present disclosure;

FIG. 7 illustrates an example of a human pose three-dimensional reconstruction flow according to an exemplary embodiment of the present disclosure;

FIG. 8 illustrates a schematic diagram of a gesture recognition based smart table control apparatus according to an exemplary embodiment of the present disclosure; and

fig. 9 shows a schematic diagram of a smart table based on gesture recognition according to an exemplary embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art. Nothing in the following detailed description is intended to indicate that any particular component, feature, or step is essential to the invention. Those skilled in the art will appreciate that various features or steps may be substituted for or combined with one another without departing from the scope of the present disclosure.

Fig. 1 shows a flowchart of a gesture recognition based smart table control method according to an exemplary embodiment of the present disclosure. Referring to fig. 1, the intelligent table control method based on gesture recognition may include the following steps: in S101, image data of a target user is acquired through an image data acquisition device; in S102, the target user is identified from the image data through a gesture recognition algorithm, and a human body posture graph required by the intelligent table is output; in S103, reconstructing human body three-dimensional posture information of the target user through a three-dimensional reconstruction algorithm based on the human body posture diagram; at S104, carrying out bone registration on the target user and a reference person by utilizing a body state key node in a three-dimensional space; at S105, comparing the human three-dimensional pose information of the target user and the reference person after bone registration; and adapting the smart table based on the comparison result at S106.

In some embodiments, the camera may include at least one of: a planar camera, a depth camera, an infrared camera or a thermal imager, wherein the depth camera may comprise at least one of: a time-of-flight camera, a structured light camera, or a binocular camera. Time of flight (TOF) is a method of obtaining distance information from an object to a camera by continuously transmitting light pulses to the object, receiving light returning from the object by a sensor, and detecting the Time of flight of the light pulses. The structured light camera scans a measured object by emitting laser so as to obtain the distance information from the surface of the measured object to the camera. The binocular camera determines distance information from a shooting target to the camera through parallax calculation of images collected by the two cameras.

For the mode of combining the planar camera and the depth camera, the planar camera provides user appearance information, and the depth camera provides distance information of a user from the shooting direction of the camera, so that the body joint position of a photographer can be analyzed. The configuration using a combination of a flat-screen camera and a depth camera allows a user to be observed when there is only a single viewing angle. The core of the configuration mode is that the planar color image and the depth point cloud are overlapped and fused on the pixels, and the fusion method at the pixel level enables the model of the invention to definitely reason the local appearance and the geometric information. In addition, a binocular camera can be used as a preferred scheme, and the binocular camera is relatively reasonable in price and can output a plane color image and a depth image.

Fig. 2 is an arrangement of cameras according to an embodiment of the present invention. As shown in fig. 2, the camera 202 and the camera 203 may acquire moving images and/or video data of at least one target user 201 from a plurality of angles. According to the relative posture between the camera intrinsic parameters and the shooting view, extracting the characteristic points of the image based on the image characteristic information (edges, lines, contours, interest points, corner points, geometric primitives and the like). And performing parallax estimation on the extracted feature points, and reconstructing a three-dimensional space scene by using the obtained parallax information to obtain the position of the body skeleton joint of the target user in the three-dimensional space.

In this embodiment, the outputting the body state diagram required by the smart table may further include: determining the posture of the human body through key nodes of the human body, wherein the key nodes comprise at least one of the following: limb joint points and facial key points, and the position information of the key nodes is represented by coordinates; determining location coordinates of at least one of the key nodes in the image data; determining category information of at least one of the key nodes, wherein the category information includes: body feature information of interest, the body feature information of interest comprising: key characteristic points of human body parts required by human body monitoring tasks and human body biomechanical model analysis aiming at different applications; determining state information of at least one of the key nodes, wherein the state information comprises: visible, invisible, and either speculative or non-speculative; and linking the key nodes into the human body posture graph through the position relation and the reliability among the key nodes.

FIG. 3 shows a model application flow diagram of an embodiment of the invention. As shown in fig. 3, the process may include the following steps: s301, training a posture recognition deep learning network model; and S302, reasoning by using the fully trained gesture recognition deep learning neural network.

In this embodiment, the gesture recognition algorithm comprises a deep learning neural network prediction algorithm, wherein the deep learning neural network needs to be trained. As shown in fig. 3, at S301, the training may include: preparing a human body posture image set, wherein human body posture image data in the human body posture image set is marked according to the key nodes; and training a deep learning model by using the human body posture image set, and updating parameters of the deep learning neural network through error back propagation until convergence to obtain the deep learning neural network which is completely trained.

Fig. 4 is a flowchart of training a deep learning network model according to an embodiment of the present invention, and as shown in fig. 4, the flowchart may include: s401, preparing an image set containing a human body, marking the human body in the human body image based on key body characteristic points required by stress analysis, wherein the marked characteristic points are used for distinguishing different human body parts; s402, constructing a neural network model; s403 defines a loss function; s404, model training: outputting predicted human posture characteristic points by taking image data as input through a built neural network model, calculating errors between the predicted characteristic points and marked characteristic points, and updating parameters of the neural network until convergence through back propagation of the errors to obtain a well-trained neural network; the training degree of the deep learning model is preferably verified in a cross-validation mode so as to enhance the generalization ability of the deep learning model and avoid overfitting of the model.

In S401, the pose images in the human body pose image set are obtained by manually labeling feature point labels. In order to obtain more human posture images for training, the images can be preprocessed, and the preprocessing mode includes but is not limited to at least one of the following modes: rotation processing, cutting processing, brightness adjustment and down-sampling processing. Preferably, when preparing the gesture image database, the human gestures in the image may be labeled according to feature points required for biomechanical analysis of the region of interest, by modifying the existing mass labeled human database.

Fig. 5 shows an example of a body key feature point marker template according to an exemplary embodiment of the present disclosure. In some embodiments, popular public data sets may be used including, but not limited to: a Human 3.6M three-dimensional Human body posture data set, a COCO Human body posture data set, an MPII Human body posture database and the like. In the MPII dataset, each person has 14 body key feature point markers with different numbers, such as right shoulder, right elbow, right wrist, left shoulder, left elbow, left wrist, right hip, right knee, right ankle, left hip, left knee, left ankle, vertex, neck; each marker records coordinate information of a corresponding body key feature point, wherein the body key feature point marker template is shown in fig. 5.

Fig. 6A and 6B show schematic diagrams of body key feature points for standing and sitting posture analysis, respectively, according to an exemplary embodiment of the present disclosure. In some embodiments, in order to perform biomechanical analysis to obtain the influence of the motion on muscles, joints and bones, a human body posture feature marking method which is more adaptive to requirements based on human engineering is provided, that is, important feature points of a human body in an image are additionally marked according to different human body mechanical analysis requirements. As shown in fig. 6A and 6B, for standing posture analysis, the feature points used include, but are not limited to, the following physical features: head, neck center, shoulder center, back center, waist, hip joint, knee joint, ankle joint, sole, etc.; for the sitting posture analysis of the target user, the feature points used include, but are not limited to, the following physical features: head, neck center, shoulder center, back center, waist, sacrum, hip joint, femur, knee joint, ischial contact points, and the like. By adopting the mode, the key points required by the human body biomechanical analysis are predicted, the accuracy rate of the gesture recognition task can be improved, and the calculated amount and the prediction time of the gesture recognition model can be further reduced.

In some embodiments, in order to further improve the accuracy and stability of the position estimation of the body key feature point, a confidence region is generated based on a pixel point position of the body key feature point marked on the image. In this region, the estimated body key feature points will have a relatively high probability of containing true values. Those skilled in the art will appreciate that other confidence region generation methods may also be used to generate confidence regions.

In some embodiments, the body key feature point monitoring deep learning neural network structure comprises a top-down deep learning neural network or a bottom-up deep learning neural network, wherein the bottom-up method is to monitor all body key feature points in an image first and then allocate all body key feature points to different human body examples respectively; and the top-down method is that the human body monitor is operated on the image firstly to find all human body examples, then all the human body examples are divided into a plurality of human body example sub-images, and the monitoring of the key characteristic points of the human body is carried out on each human body example sub-image. The method has the advantages that the speed and the accuracy of identifying the key body characteristic points of a single person are high, the high performance is realized, and the time for identifying the posture of the whole image is increased along with the increase of the number of the persons. However, the accuracy of recognizing complex body poses tends to be somewhat lower because the body key points cannot be modeled more carefully. In the bottom-up method, the key points of the human body parts in the image are firstly monitored on the image, and then the key points of the parts of the human body of a plurality of people in the image are respectively distributed to different human body examples. Although the method cannot benefit from the overall human body structure information, in the case that the image comprises a plurality of human body examples, the accuracy of detecting and classifying the key feature points of each human body is high, and the identification time cannot be obviously increased along with the increase of the number of detected people.

In other words, in the case of multi-person pose extraction, a bottom-up deep learning body key feature point monitoring neural network architecture model is preferably used. However, in the case of single-person gesture extraction, in order to improve the accuracy of single-person gesture detection, it is preferable to employ a top-down deep learning body key feature point monitoring neural network architecture model.

At S302, performing inference using the well-trained gesture recognition deep learning neural network may include: and taking the image or video data as input, and outputting each estimated human body state key feature point through a completely trained neural network, wherein the feature point information comprises: coordinates on the graph, state and confidence, etc.

Fig. 7 illustrates an example of a human pose three-dimensional reconstruction flow according to an exemplary embodiment of the present disclosure. As shown in fig. 7, reconstructing the human body three-dimensional posture information of the target user through a three-dimensional reconstruction algorithm based on the human body posture diagram may further include: acquiring shooting parameters of the image data acquisition device, and establishing a three-dimensional space coordinate system according to the shooting parameters, wherein the shooting parameters comprise at least one of the following parameters: the orientation, angle and viewing angle of the camera, and the focal length.

In S701, the image capturing apparatus parameters, the photographed image, and the predicted human posture feature points acquired through step S302 are acquired. In the above step, the acquired parameters of the image capturing device include: position information and shooting direction of the image acquisition device. Before scanning, shooting parameters of each camera are obtained, and a space coordinate system of the camera is calibrated and adjusted. The method comprises the steps of placing a camera at a certain position or certain positions of a motion space, collecting user action information from a plurality of selectable visual angles, and then performing three-dimensional reconstruction. The position of the camera is then calibrated. Each camera can establish a spatial three-dimensional rectangular coordinate system according to the origin of the position of the camera. The preferred definition in this example is: the coordinate origin is fixed at the lens of the camera, the x and y axes are parallel to two sides of the imaging surface of the camera, and the z axis is positioned in the direction of the shooting optical axis of the lens and is vertical to the phase surface.

At S702, the same person in each view is matched. In the above steps, the method for matching the same person in each view by an algorithm includes but is not limited to: determining the position of a target user through position tracking in the wearable device; under the condition of obtaining the depth image information, determining a human body according to the position of the human body positioned in the depth view in the three-dimensional space; and under the condition of acquiring the plane image information, calculating the human body feature similarity across multiple views according to the human body appearance similarity and the geometric compatibility. Preferably, the posture recognition deep learning algorithm is used for calculating the output characteristic result of the specific convolution layer based on the plane image information and matching the characteristic result; alternative methods include machine learning or optical flow methods, etc.

In S703, three-dimensional pose information of the target is reconstructed from the matched image data and the body key feature point information.

In some embodiments, reconstructing the human three-dimensional pose information of the target actor through a three-dimensional reconstruction algorithm based on the human body state diagram further comprises: in the case of a single depth camera, reconstructing the human three-dimensional pose information of the target user by converting the human body pose image generated from the depth image acquired by the depth camera into a three-dimensional point cloud image; in the case of a combination of a plane camera and a depth camera, processing the human body posture image generated by a plane image acquired by the plane camera and a three-dimensional point cloud image converted from a depth image acquired by the depth camera to reconstruct the human body three-dimensional posture information of the target sporter; or in the case of a multi-plane camera or a multi-depth camera combination, the position of the bone joint of the target sporter body in the three-dimensional space is obtained by projecting the human body posture image into the three-dimensional space coordinate system through a triangulation method according to the position and the relative posture of the camera placed at multiple angles and based on the marked human body posture data output in S302, thereby calculating the three-dimensional space coordinates of the feature point. Additionally or alternatively, the matching result can be constrained according to the consistency of time context between videos, and the accuracy rate of the three-dimensional space body posture prediction is improved. In this embodiment, the bone registration of the target user with the reference using the posture key node in the three-dimensional space may further include global bone scaling and local bone scaling. The global skeleton scaling refers to registering a coordinate set of key nodes of the whole human body, and the local skeleton scaling refers to registering coordinates of local key nodes in the key nodes of the human body, and includes: calculating a bone length of the target user and the reference, wherein the bone length is a distance between the location coordinates of the key nodes linked together, wherein the distance may include at least one of: euclidean distance, standardized Euclidean distance, Mahalanobis distance, cosine distance; performing bone registration on the bone length of the target user according to the corresponding bone length of the reference; or performing bone registration on the bone length of the reference person according to the corresponding bone length of the target user.

In some embodiments, the bone registration of the target motion person and the reference person is performed by using the key nodes of the posture in the three-dimensional space, which includes a mathematical calculation process of converting the coordinate set of the three-dimensional posture data points of the target motion person obtained in step S703 into a coordinate system of the coordinate set of the posture characteristic in the reference three-dimensional space of the reference person. For example, two posture feature coordinates of the target sporter and the referent are considered as two point sets, and the posture feature coordinate set is composed of at least one feature joint coordinate. Registration target objects can be classified as global bone registration based and local bone registration based. The global bone registration refers to the registration of the whole human posture feature coordinate set. Local bone registration refers to registration of local key feature coordinates in a human body posture feature coordinate set.

In a preferred embodiment, the registration method is to find a spatial transformation process for aligning two attitude point sets, so as to determine the current target attitude information. Wherein the process of transforming comprises: merging a plurality of attitude characteristic node sets into a global unified model; and mapping the current set of pose feature nodes onto the set of reference nodes to identify features or estimate poses thereof.

In some embodiments, comparing the human three-dimensional pose information of the skeletal registered target user and the reference person at a predetermined time or over a continuous period of time may further comprise at least one of: comparing the distances of the key nodes of the target user and the corresponding key nodes of the reference person on the three-dimensional space one by one through calculation, wherein the larger the distance is, the larger the gesture difference is; calculating distances between a plurality of key nodes of the target user and a plurality of corresponding key nodes of the reference person on the three-dimensional space and averaging the distances for comparison, wherein the larger the average value is, the larger the gesture gap is; and comparing the included angle between the line segment formed by the corresponding interlinkage key nodes of the target user and the line segment formed by the corresponding interlinkage key nodes of the reference person by calculating, wherein the larger the included angle is, the larger the gesture difference is.

In this embodiment, based on the comparison result, adapting the smart table may further include: obtaining a distance score based on weighting the comparison results obtained by one or more comparison modes, wherein the distance score is used for indicating whether the current intelligent table meets the reasonability of the ergonomics of the working posture of the current user, for example, when the intelligent table is divided into two types, the classification results can contain two quality factors of reasonable and unreasonable; the quality factor is reasonable, and the quality factor is unreasonable, so that the height or the inclination angle of the intelligent table needs to be adjusted adaptively to meet the working requirements of users; forming a memory of the height or tilt angle of the smart table for the target user so as to directly adjust the smart table to the height or tilt angle memorized for the target user in response to identifying the target user.

In some embodiments, a pose optimization suggestion may be fed back to the target user based on a user pose predicted from the image. Wherein the optimization suggestions include, but are not limited to: straight back, forward-leaning waist, straight head, etc.

Fig. 8 illustrates a schematic diagram of a smart table control device based on gesture recognition according to an exemplary embodiment of the present disclosure. As shown in fig. 8, includes: an image acquisition module 801 configured to acquire image data of a target user through an image data acquisition device; a gesture recognition module 802 configured to recognize the target user from the image data through a gesture recognition algorithm and output a human body posture map required by the smart table; a three-dimensional reconstruction module 803 configured to reconstruct human three-dimensional pose information of the target user through a three-dimensional reconstruction algorithm based on the human body posture map; a bone registration module 804 configured to bone register the target user with a reference using a posture key node in three-dimensional space; a pose comparison module 805 configured to compare the skeletal registered human three-dimensional pose information of the target user and the reference; and an adaptive adjustment module 806 configured to adaptively adjust the smart table based on the comparison result.

Fig. 9 shows a schematic diagram of a smart table based on gesture recognition according to an exemplary embodiment of the present disclosure. Various embodiments of the disclosure may be more intuitively understood by those skilled in the art from the schematic diagram shown in fig. 9. As shown in fig. 9, the sensor 901 can receive image data of a target user acquired by the image data acquisition device. The controller 902 has a control button, and the height or the inclination angle of the smart table is finely adjusted by a manual button mode. The control box 903 may adaptively adjust the smart table by reconstructing three-dimensional posture information and performing posture analysis by using a deep learning neural network in combination with a conventional machine learning technique based on image data received by the sensor 901, and feed back an optimization suggestion to a target user. The lifter 904 can lift the tabletop of the intelligent table through an electric push rod to adjust the height or the inclination angle of the intelligent table. Further, the smart tables to which the present disclosure relates include, but are not limited to: desk, podium, desk, test bench.

It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the invention is not limited in this respect.

In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the disclosure may be practiced without these specific details. In some embodiments, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

While exemplary embodiments of the present invention have been shown and described herein, it will be readily understood by those skilled in the art that such embodiments are provided by way of example only. Numerous modifications, changes, and substitutions will now occur to those skilled in the art without departing from the invention. It should be understood that various alternatives to the embodiments of the invention described herein may be employed in practicing the invention. It is intended that the following claims define the scope of the invention and that methods and structures within the scope of these claims and their equivalents be covered thereby.

Claims

1. An intelligent table control method based on gesture recognition is characterized by comprising the following steps:

acquiring image data of a target user through an image data acquisition device;

identifying the target user from the image data through a gesture identification algorithm, and outputting a human body posture graph required by the intelligent table;

reconstructing human body three-dimensional posture information of the target user through a three-dimensional reconstruction algorithm based on the human body posture diagram;

carrying out bone registration on the target user and a reference person by utilizing a posture key node in a three-dimensional space;

comparing the human three-dimensional pose information of the target user and the reference person after bone registration; and

and adaptively adjusting the intelligent table based on the comparison result.

2. The method of claim 1, wherein the image data acquisition device comprises at least one of: planar camera, degree of depth camera, infrared camera or thermal imaging system, wherein the degree of depth camera includes following at least one: a time-of-flight camera, a structured light camera, or a binocular camera.

3. The method of claim 1, wherein outputting the body state diagram required by the smart table further comprises:

determining the posture of the human body through key nodes of the human body, wherein the key nodes comprise at least one of the following: limb joint points and facial key points, and the position information of the key nodes is represented by coordinates;

determining location coordinates of at least one of the key nodes in the image data;

determining category information of at least one of the key nodes, wherein the category information includes: body feature information of interest, the body feature information of interest comprising: key characteristic points of human body parts required by human body monitoring tasks and human body biomechanical model analysis aiming at different applications;

determining state information of at least one of the key nodes, wherein the state information comprises: visible, invisible, and either speculative or non-speculative; and

and linking the key nodes into the human body posture graph through the position relation and the reliability among the key nodes.

4. The method of claim 3, wherein the gesture recognition algorithm comprises a deep learning neural network prediction algorithm, wherein the deep learning neural network requires training, the training comprising:

preparing a human body posture image set, wherein human body posture image data in the human body posture image set is marked according to the key nodes; and

and training a deep learning model by using the human body posture image set, and updating the parameters of the deep learning neural network through error back propagation until convergence to obtain the deep learning neural network which is completely trained.

5. The method of claim 1, wherein reconstructing the human three-dimensional pose information of the target user by a three-dimensional reconstruction algorithm based on the human body state diagram further comprises:

acquiring shooting parameters of the image data acquisition device, and establishing a three-dimensional space coordinate system according to the shooting parameters, wherein the shooting parameters comprise at least one of the following parameters: the orientation, angle and viewing angle of the camera, and the focal length.

6. The method of claim 5, wherein reconstructing the human three-dimensional pose information of the target user by a three-dimensional reconstruction algorithm based on the human body state diagram further comprises:

in the case of a single depth camera, reconstructing the human three-dimensional pose information of the target user by converting the human body pose image generated from the depth image acquired by the depth camera into a three-dimensional point cloud image;

in the case of a combination of a plane camera and a depth camera, processing the human body posture image generated by a plane image acquired by the plane camera and a three-dimensional point cloud image converted from a depth image acquired by the depth camera to reconstruct the human body three-dimensional posture information of the target user; or

And under the condition of the combination of multi-view image data acquisition devices, the human body three-dimensional posture information of the target user is reconstructed by projecting the human body posture image generated by the image data acquisition device at each view angle into the three-dimensional space coordinate system.

7. The method of claim 3, wherein the bone registration of the target user with the reference using the key nodes in three-dimensional space further comprises global bone scaling and local bone scaling, wherein the global bone scaling refers to the registration of the coordinate sets of the key nodes for the entire human body, and the local bone scaling refers to the registration of the coordinates of the local key nodes among the key nodes for the human body, comprising:

calculating a bone length of the target user and the reference, wherein the bone length is a distance between the location coordinates of the key nodes linked together, wherein the distance comprises at least one of: euclidean distance, standardized Euclidean distance, Mahalanobis distance, cosine distance;

performing bone registration on the bone length of the target user according to the corresponding bone length of the reference; or

And carrying out bone registration on the bone length of the reference person according to the corresponding bone length of the target user.

8. The method of claim 7, wherein comparing the human three-dimensional pose information of the target user and the reference person registered via bone further comprises at least one of:

comparing the distances of the key nodes of the target user and the corresponding key nodes of the reference person on the three-dimensional space one by one through calculation, wherein the larger the distance is, the larger the gesture difference is;

calculating distances between a plurality of key nodes of the target user and a plurality of corresponding key nodes of the reference person on the three-dimensional space and averaging the distances for comparison, wherein the larger the average value is, the larger the gesture gap is; and

and comparing the included angle between the line segment formed by the linked key nodes of the target user and the line segment formed by the corresponding linked key nodes of the reference person, wherein the larger the included angle is, the larger the gesture difference is.

9. The method of claim 8, wherein adapting the smart table based on the comparison further comprises:

adaptively adjusting the height or tilt angle of the smart table based on a score obtained by weighting the comparison results obtained by one or more comparison methods of claim 8;

forming a memory of the height or tilt angle of the smart table for the target user so as to directly adjust the smart table to the height or tilt angle memorized for the target user in response to identifying the target user; and

feeding back a pose optimization suggestion to the target user.

10. An apparatus for using the method of any of claims 1-9, comprising:

an image acquisition module configured to acquire image data of a target user through an image data acquisition device;

a gesture recognition module configured to recognize the target user from the image data through a gesture recognition algorithm and output a human body posture image required by the smart table;

a three-dimensional reconstruction module configured to reconstruct human three-dimensional pose information of the target user through a three-dimensional reconstruction algorithm based on the human body posture map;

a bone registration module configured to bone register the target user with a reference using a posture key node in three-dimensional space;

a pose comparison module configured to compare the skeletal registered human three-dimensional pose information of the target user and the reference; and

an adaptive adjustment module configured to adaptively adjust the smart table based on the comparison result.