CN112288798A

CN112288798A - Posture recognition and training method, device and system

Info

Publication number: CN112288798A
Application number: CN201910673695.2A
Authority: CN
Inventors: 黄海安
Original assignee: Gaoyida Technology Shenzhen Co ltd; Robotics Robotics Shenzhen Ltd
Current assignee: Gaoyida Technology Shenzhen Co ltd; Robotics Robotics Shenzhen Ltd
Priority date: 2019-07-24
Filing date: 2019-07-24
Publication date: 2021-01-29

Abstract

The application relates to a posture recognition and training method, device and system. The gesture recognition method comprises the following steps: acquiring a three-dimensional image of a target object; acquiring a gesture recognition model; and inputting the three-dimensional image into the gesture recognition model, and outputting a gesture recognition result of the target object. By adopting the technical scheme of the invention, the gesture recognition is carried out by combining the three-dimensional image and an artificial intelligence-based method, and the gesture of the target object can be recognized in a short time only by simply shooting the object.

Description

Posture recognition and training method, device and system

Technical Field

The present application relates to the field of gesture recognition technologies, and in particular, to a gesture recognition method, a gesture training method, a gesture recognition device, a gesture training device, and a gesture recognition system.

Background

With the improvement of the technological level, the whole society develops towards the direction of intellectualization and automation.

The artificial intelligence brings infinite possibility for the future development of the robot, and the robot controlled based on the network model can learn and recognize the posture of an object independently by training the neural network model.

Attitude estimation is a key to the fields of augmented reality, virtual reality and robots; the gesture recognition result of the target object can be directly obtained based on the gesture estimation of the three-dimensional image (such as a point cloud image or a depth image), but the traditional gesture recognition method based on the three-dimensional image is long in time consumption and poor in real-time performance, and generally needs to shoot the object in multiple angles.

Disclosure of Invention

Based on the above, the invention provides a posture recognition and training method, device and system.

A first aspect of the present invention provides a gesture recognition method, including:

acquiring a three-dimensional image of a target object;

acquiring a gesture recognition model; and

and inputting the three-dimensional image into the gesture recognition model, and outputting a gesture recognition result of the target object.

Preferably, after outputting the gesture recognition result of the target object, the method further includes:

acquiring the gesture recognition result;

and optimizing the gesture recognition result to obtain an optimized result.

Preferably, the acquiring of the three-dimensional image of the target object further includes:

acquiring an initial three-dimensional image comprising a foreground and a background; wherein the foreground represents the object;

extracting the foreground in the initial three-dimensional image to obtain a three-dimensional image of the target object only comprising the foreground or a single background; or

acquiring auxiliary data of a target object;

and extracting the foreground in the initial three-dimensional image by combining the auxiliary data to obtain a three-dimensional image of the target object only comprising the foreground or a single background.

A second aspect of the present invention provides a gesture recognition method, including:

acquiring a three-dimensional image of a target object;

acquiring auxiliary data of a target object;

acquiring a gesture recognition model; and

and inputting the three-dimensional image and the auxiliary data into the gesture recognition model, and outputting a recognition result.

acquiring the gesture recognition result;

and optimizing the gesture recognition result to obtain an optimized result.

acquiring auxiliary data of a target object;

The third aspect of the invention provides a gesture recognition training method, which comprises the steps of

Acquiring a first training sample set;

acquiring an initial model of the gesture recognition model;

training the initial model based on the training sample set to obtain a posture recognition model; the gesture recognition model is used for outputting a gesture recognition result of the target object to the input three-dimensional image; or

Acquiring a second training sample set;

acquiring an initial model of the gesture recognition model;

training the initial model based on the training sample set to obtain a posture recognition model; the gesture recognition model is used for outputting a gesture recognition result of the target object to the input three-dimensional image and the auxiliary data.

A fourth aspect of the present invention provides an attitude recognition apparatus, comprising:

the three-dimensional image acquisition module is used for acquiring a three-dimensional image of a target object;

the recognition model acquisition module is used for acquiring a gesture recognition model; and

and the recognition result output module is used for inputting the three-dimensional image into the gesture recognition model and outputting the gesture recognition result of the target object.

Preferably, the gesture recognition apparatus further includes:

the recognition result acquisition module is used for acquiring the gesture recognition result;

and the recognition result optimization module is used for optimizing the gesture recognition result to obtain an optimization result.

the system comprises an initial image acquisition module, a foreground acquisition module and a background acquisition module, wherein the initial image acquisition module is used for acquiring an initial three-dimensional image comprising a foreground and a background; wherein the foreground represents the object;

a three-dimensional image extraction module for extracting the foreground in the three-dimensional image; obtaining a three-dimensional image of the object including only the foreground or the single background; or

the auxiliary data acquisition module is used for acquiring auxiliary data of the target object;

and the three-dimensional image extraction module is used for extracting the foreground in the initial three-dimensional image by combining the auxiliary data to obtain a three-dimensional image of the target object only comprising the foreground or a single background.

A fifth aspect of the present invention provides a posture identifying apparatus comprising:

and the recognition result output module is used for inputting the three-dimensional image and the auxiliary data into the gesture recognition model and outputting a recognition result.

Preferably, the gesture recognition apparatus further includes:

A sixth aspect of the present invention provides a posture-recognition training device including:

the device comprises a sample acquisition module, a data acquisition module and a data processing module, wherein the sample acquisition module is used for acquiring a first training sample set;

the model acquisition module is used for acquiring an initial model of the gesture recognition model;

the model training module is used for inputting the training sample set into the initial model for training to obtain a posture recognition model; the gesture recognition model is used for outputting a gesture recognition result of the target object to the input three-dimensional image; or

The sample acquisition module is used for acquiring a second training sample set;

the model training module is used for inputting the training sample set into the initial model for training to obtain a posture recognition model; the gesture recognition model is used for outputting a gesture recognition result of the target object to the input three-dimensional image and the auxiliary data.

A seventh aspect of the present invention provides a posture identifying system including an image sensor and a control device;

the image sensor is used for acquiring a three-dimensional image of a target object and sending the three-dimensional image to the control device;

the control device is used for acquiring a three-dimensional image of the target object; acquiring a gesture recognition model; inputting the three-dimensional image into the gesture recognition model, and outputting a gesture recognition result of the target object; or

The system comprises a three-dimensional image acquisition unit, a three-dimensional image acquisition unit and a display unit, wherein the three-dimensional image acquisition unit is used for acquiring a three-dimensional image of a target object; acquiring auxiliary data of a target object; acquiring a gesture recognition model; inputting the three-dimensional image and the auxiliary data into the gesture recognition model, and outputting a recognition result; or

The image sensor is used for acquiring a two-dimensional image of a target object and sending the two-dimensional image to the control device;

the control device is used for acquiring a two-dimensional image of the target object; generating a three-dimensional image of the target object according to the two-dimensional image; acquiring a three-dimensional image of the target object; acquiring a gesture recognition model; inputting the three-dimensional image into the gesture recognition model, and outputting a gesture recognition result of the target object; or

The system comprises a three-dimensional image acquisition unit, a three-dimensional image acquisition unit and a display unit, wherein the three-dimensional image acquisition unit is used for acquiring a three-dimensional image of a target object; acquiring auxiliary data of a target object; acquiring a gesture recognition model; and inputting the three-dimensional image and the auxiliary data into the gesture recognition model, and outputting a recognition result.

An eighth aspect of the present invention provides a computer apparatus, comprising a memory and a processor, wherein the memory stores a computer program, and the processor implements the gesture recognition method according to any one of the above items when executing the computer program; and/or the gesture recognition training method described above.

A ninth aspect of the present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the gesture recognition method of any one of the above; and/or the gesture recognition training method described above.

By combining the three-dimensional image and carrying out gesture recognition based on an artificial intelligence method, the gesture of the target object can be recognized in a short time only by simply shooting the object.

Drawings

FIG. 1 is a first flowchart of a gesture recognition method in one embodiment;

FIG. 2 is a diagram illustrating a second process of a gesture recognition method according to an embodiment;

FIG. 3 is a diagram illustrating a third process of a gesture recognition method according to an embodiment;

FIG. 4 is a fourth flowchart illustrating a gesture recognition method according to an embodiment;

FIG. 5 is a fifth flowchart illustrating a method for gesture recognition according to an embodiment;

FIG. 6 is a sixth flowchart illustrating a gesture recognition method according to an embodiment;

FIG. 7 is a seventh flowchart illustrating a gesture recognition method according to an embodiment;

FIG. 8 is an eighth flowchart illustrating a gesture recognition method according to an embodiment;

FIG. 9 is a ninth flowchart illustrating a gesture recognition method according to an embodiment;

FIG. 10 is a first flowchart of a method for training a gesture recognition model according to an embodiment;

FIG. 11 is a diagram illustrating a first process of a gesture recognition training method according to an embodiment;

FIG. 12 is a first block diagram of a gesture recognition apparatus in one embodiment;

FIG. 13 is a block diagram showing a second configuration of a posture identifying apparatus in one embodiment;

FIG. 14 is a block diagram showing a third configuration of a posture identifying apparatus in one embodiment;

FIG. 15 is a fourth structural block diagram of a posture identifying apparatus in one embodiment;

FIG. 16 is a block diagram showing a fifth configuration of a posture identifying apparatus in one embodiment;

FIG. 17 is a block diagram showing a sixth configuration of a posture identifying apparatus in one embodiment;

FIG. 18 is a block diagram showing a seventh configuration of a posture identifying apparatus in one embodiment;

FIG. 19 is a block diagram showing an eighth configuration of the posture identifying apparatus in one embodiment;

FIG. 20 is a first block diagram of an identification optimization module in one embodiment;

FIG. 21 is a first block diagram of a gesture recognition training apparatus in accordance with one embodiment;

FIG. 22 is a block diagram showing a second configuration of a posture-recognition training apparatus according to an embodiment;

FIG. 23 is a first block diagram that illustrates an embodiment of a gesture recognition system, in accordance with one embodiment;

FIG. 24 is a first block diagram showing an application environment of the gesture recognition method in one embodiment;

FIG. 25 is a diagram of a first structure of a computer device in one embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The gesture recognition method provided by the present application may be applied to an application environment as shown in fig. 24, where the application environment may include a terminal 700 and/or a server 800, and the terminal 700 communicates with the server 800 through a network. The method can be applied to both the terminal 700 and the server 800. The terminal 700 may be, but is not limited to, various industrial computers, personal computers, notebook computers, smart phones, and tablet computers. The server 800 may be implemented as a stand-alone server or as a server cluster comprised of multiple servers.

In one embodiment, as shown in fig. 1, a gesture recognition method is provided, which may include the following steps, for example, when the method is applied to the terminal in fig. 24:

step S101, acquiring a three-dimensional image of a target object;

specifically, the three-dimensional image can be acquired by an image sensor (such as a 3D image sensor) and directly transmitted; or post-processing a two-dimensional image acquired by an image sensor to obtain a three-dimensional image (such as a point cloud image or a depth image), and acquiring the three-dimensional image from a memory or a server or the like.

Specific image sensors may include, but are not limited to: cameras, video cameras, scanners or other devices with associated functions (cell phones, computers), etc.

The point cloud image is a massive point set which expresses the target space distribution and the target surface characteristics under the same spatial reference system.

In one embodiment, the point cloud image may be a light image projected by a light emitter to a target object with unique characteristic information, as described in patent CN 107241592; then, acquiring a two-dimensional image of the target object after the projected light image is projected through one or more image sensors, and directly obtaining a point cloud image through matching the two-dimensional image; or

In one embodiment, the point cloud image may be further processed based on depth information of the depth image.

The depth image is an image having, as a pixel value, a distance (depth) value from an image sensor to each point in a scene.

Specifically, the depth image can simultaneously obtain two images of the same scene through two cameras at a certain distance, corresponding pixel points in the two images are found through a stereo matching algorithm, then parallax information is calculated according to a triangular principle, and the parallax information can be used for representing depth information of objects in the scene through conversion, so that the depth image is generated according to the depth information; or based on a stereo matching algorithm, a group of images at different angles in the same scene are shot to obtain a depth image of the scene. Besides, the scene depth information can be obtained by analyzing and indirectly estimating the characteristics of luminosity, brightness and the like of the image.

Step S102, acquiring a posture recognition model;

the previously trained gesture recognition model is retrieved from memory or a server.

Step S103 inputs the three-dimensional image into the posture recognition model, and outputs a posture recognition result of the target object.

In one embodiment, the gesture recognition result may be described by 6d coordinates (6 degrees of freedom), and specifically, may be divided into a rotational gesture and a translational position, each of which has 3 degrees of freedom; translation is a common linear transformation, and a 3x1 vector can be used to describe the translation position; while rotational gestures are commonly described in a manner including, but not limited to: rotation matrix, rotation vector, quaternion, euler angle and lie algebra.

By adopting the technical method, the gesture recognition is carried out by combining the three-dimensional image and an artificial intelligence-based method, and the gesture of the target object can be recognized in a short time only by simply shooting the object (for example, only once shooting is needed in some cases).

In addition, since the posture of the target object can be recognized in a short time, the posture recognition is good in real-time.

In addition, the method based on artificial intelligence can improve the generalization capability of the gesture recognition based on the three-dimensional image under various conditions.

In one embodiment, the three-dimensional image of the object may be a three-dimensional image including the object and a complex background; besides, in another embodiment, the three-dimensional image may also be a three-dimensional image including only the object or a single background (i.e. including only the object and the single background), as shown in fig. 2, then the following method steps may be further included before step S101:

step S104, acquiring an initial three-dimensional image comprising a foreground and a complex background; wherein the foreground represents the target;

step S105 extracts the target object in the initial three-dimensional image, and generates a three-dimensional image including only the target object or the target object with a single background.

Specifically, the single background means that the background adopts a single pattern or color.

Specifically, the extraction may include, but is not limited to, the following methods:

in one embodiment, the foreground can be scratched out of the original image according to the external contour; such as: identifying a foreground part in an initial image based on traditional various image processing methods (such as binarization, edge detection, connected domain and the like), and then extracting the foreground part; or

Further, in an embodiment, the stripped foreground may also be mapped onto a rectangular image with a preset size of a single background, so as to generate a target image of the single background; further, in an embodiment, the cropped image may also be mapped onto a rectangular image with a preset size of a single background, so as to generate a target image with a single background; or

In one embodiment, the original image with a certain size is cut to a certain extent, so that the cut image is the smallest cross-section image surrounding the outer frame of the object (for example, if the original image is a 100 × 200 image and the cut image is 50 × 80, the cut image can be regarded as the object image including only foreground); or

In one embodiment, the initial image is processed such that the processed image includes only a foreground and a single background, and so on.

Specifically, the foreground extraction model can be obtained by an artificial intelligence method; or

Based on conventional image processing methods, for example: taking an image-based processing method of a binarized image as an example, the foreground may be set to be white, and the background may be uniformly adjusted to be black.

In one embodiment, as shown in fig. 3, the following method steps may be further included before step S101:

step S104' obtaining an initial three-dimensional image comprising a foreground and a complex background; wherein the foreground represents the target;

step S106' obtaining auxiliary data of the target object;

in particular, the assistance data may include, but is not limited to: two-dimensional image data (such as an RGB image, a black-and-white image, or a grayscale image) of various objects; or 3d model data of the object.

Step S105' combines the auxiliary data to extract the target object in the initial three-dimensional image, and generates a three-dimensional image including only the target object or the target object with a single background.

The auxiliary data is combined to extract the target object in the three-dimensional image, so that the target object can be extracted more accurately.

In one embodiment, as shown in fig. 4, after outputting the gesture recognition result of the target object, the method further includes:

step S107, acquiring a gesture recognition result;

and S108, optimizing the posture recognition result to obtain an optimized result.

Specifically, the optimization method may include, but is not limited to: iterative Closest Point (ICP) based Point cloud.

In one embodiment, as shown in fig. 5, taking the three-dimensional image as the point cloud image as an example, and taking the ICP optimization method as an example, step S108 may include the following method steps:

step S1081, acquiring a reference point cloud and a point cloud of a target object;

in one embodiment, the reference point cloud may be formed by randomly collecting points on a 3d model (e.g., a CAD model) of the object from the CAD model.

Step S1082, obtaining an initial posture of the target object;

specifically, the gesture recognition result of the target object obtained in step S103 may be obtained as an initial gesture; if there is no initial gesture given in step S103, an initial gesture may be randomly generated;

step S1083, determining a corresponding relation between each point and a point on the reference point cloud and the point cloud of the target object according to the initial posture;

in one embodiment, the reference point cloud may be rotated and the centers of the two sets of point clouds may be coincident, and a nearest neighbor algorithm is used to find a correspondence between each point of the two sets of point clouds and the point. (Note: this correspondence is not necessarily correct)

Step S1084 calculates the distance between the dots according to the correspondence between the dots and the dots.

And S1085, optimizing the initial posture by taking the minimum global distance as a target according to the distance to obtain a more accurate updating posture.

Step S1086 repeats steps S1083-S1085 to continuously update the initial posture until a predetermined condition is satisfied (e.g., the update difference is smaller than a predetermined threshold, or the number of updates satisfies a predetermined number of times, etc.).

In one embodiment, the gesture recognition method may further include the step S150 of presenting the gesture recognition result or the optimization result. Namely, the gesture recognition result or the optimization result is sent to a displayer to be displayed.

Specifically, the display may be various types of displays or other devices (mobile phones, computers, etc.) with related functions.

In one embodiment, as shown in fig. 10, a training method for gesture recognition is provided, which also takes the application of the method to the terminal in fig. 24 as an example, and the method includes the following method steps:

step S301, a first training sample set is obtained;

specifically, the first training sample set is a set of three-dimensional images of a plurality of target objects;

step S302, an initial model of the gesture recognition model is obtained;

step S303 is used for training the initial model based on the training sample set to obtain a posture recognition model; the gesture recognition model is used for outputting a gesture recognition result of the target object to the input three-dimensional image.

Specifically, the gesture recognition model may include, but is not limited to: PointNet models and other various now known or later developed network model structures that can output for three-dimensional image input.

Specifically, the training sample set may be generated by various methods. In one embodiment, taking a point cloud image as an example, an object may be captured with a point cloud from many perspectives, forming a training sample set of point cloud images, and then using some conventional algorithms such as: ICP, linemod, etc. to calculate the pose of the object corresponding to each point cloud image, and the pose is used as the label of the point cloud image. The model may be trained (i.e., supervised learning methods) based on a point cloud image training sample set and corresponding labels.

Specifically, the training method of the gesture recognition model may include, but is not limited to: supervised, semi-supervised or unsupervised learning methods, and other now known or later developed model training methods.

In one embodiment, as shown in fig. 6, a gesture recognition method is provided, which also takes the application of the method to the terminal in fig. 24 as an example, and the gesture recognition method includes:

step S201, acquiring a three-dimensional image of a target object;

step S202, acquiring auxiliary data of a target object;

step S203, acquiring a posture recognition model;

step S204, inputting the three-dimensional image and the auxiliary data into the gesture recognition model, and outputting the gesture recognition result of the target object.

Specifically, the auxiliary data may be, but is not limited to: various two-dimensional image data (e.g., an RGB image, a black-and-white image, or a grayscale image); or 3d model data of the object.

Because the auxiliary data and the three-dimensional image are added and input into the model together for gesture recognition, the auxiliary data can help to improve the accuracy of gesture recognition.

In one embodiment, the three-dimensional image of the object may be a three-dimensional image including the object and a complex background; besides, in another embodiment, the three-dimensional image may also be a three-dimensional image including only the target object or including the target object and a single background, as shown in fig. 7, the following method steps may be further included before step S110:

step S205, acquiring an initial three-dimensional image comprising a foreground and a complex background; wherein the foreground represents the target;

step S206 extracts the target object in the initial three-dimensional image, and generates a three-dimensional image including only the target object or the target object with a single background.

In one embodiment, as shown in fig. 8, the following method steps may be further included before step S110:

step S205' obtains an initial three-dimensional image comprising a foreground and a complex background; wherein the foreground represents the target;

step S207' obtains auxiliary data of the target object;

specifically, the auxiliary data may be, but is not limited to: two-dimensional image data (such as an RGB image, a black-and-white image, or a grayscale image) of various objects; or 3d model data of the object.

Step S206' combines the auxiliary data to extract the target object in the initial three-dimensional image, and generates a three-dimensional image including only the target object or the target object with a single background.

The auxiliary data is combined to extract the target object in the three-dimensional image, so that the target object can be more accurately extracted.

In one embodiment, as shown in fig. 9, after the step S204 outputs the gesture recognition result of the target object, the method further includes:

step S208, acquiring a gesture recognition result;

step S209 optimizes the gesture recognition result to obtain an optimized result.

In one embodiment, the gesture recognition method may further include presenting the gesture recognition result or the optimization result at step S210. Namely, the gesture recognition result or the optimization result is sent to a displayer to be displayed.

In one embodiment, as shown in fig. 11, a training method for gesture recognition is provided, which includes the following steps, taking the method as an example for being applied to the terminal in fig. 24:

step S401, a second training sample set is obtained;

specifically, the second training sample set includes sets of three-dimensional images of a plurality of targets and corresponding auxiliary data.

Step S402, obtaining an initial model of the gesture recognition model;

step S403, training an initial model based on a training sample set to obtain a posture recognition model; the gesture recognition model is used for outputting a gesture recognition result of the target object to the input three-dimensional image and the auxiliary data.

For other relevant descriptions of the gesture recognition method, the gesture recognition training method, and the like, reference is made to the above embodiments, which are not repeated herein.

It should be understood that although the various steps in the flow charts of fig. 1-11 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 1-11 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternating with other steps or at least some of the sub-steps or stages of other steps.

In one embodiment, as shown in fig. 12, there is provided a gesture recognition apparatus including:

a three-dimensional image acquisition module 101, configured to acquire a three-dimensional image of a target object;

a recognition model obtaining module 102, configured to obtain a gesture recognition model; and

and a recognition result output module 103, configured to input the three-dimensional image into the gesture recognition model, and output a gesture recognition result of the target object.

In one embodiment, as shown in fig. 13, the gesture recognition apparatus further includes:

an initial image acquisition module 104, configured to acquire an initial three-dimensional image including a foreground and a background; wherein the foreground represents the object;

a three-dimensional image extraction module 105 for extracting the foreground in the three-dimensional image; and obtaining a three-dimensional image of the target object only comprising the foreground or the single background.

In one embodiment, as shown in fig. 14, the gesture recognition apparatus further includes:

an initial image acquisition module 104' for acquiring an initial three-dimensional image including a foreground and a background; wherein the foreground represents the object;

an auxiliary data acquisition module 106' for acquiring auxiliary data of the target object;

a three-dimensional image extraction module 105' for extracting the foreground in the three-dimensional image in combination with auxiliary data; and obtaining a three-dimensional image of the target object only comprising the foreground or the single background.

In one embodiment, as shown in fig. 15, the gesture recognition apparatus further includes:

a recognition result obtaining module 107, configured to obtain the gesture recognition result;

and the recognition result optimization module 108 is configured to optimize the gesture recognition result to obtain an optimization result.

Further, in one embodiment, as shown in fig. 20, the recognition result optimizing module includes:

a point cloud obtaining unit 1081, configured to obtain a reference point cloud and a point cloud of a target object;

an initial posture acquiring unit 1082 for acquiring an initial posture of the target object;

a correspondence determining unit 1083, configured to determine, according to the initial pose, a correspondence between each point on the point clouds of the reference point cloud and the target object;

the distance calculation unit 1084 is configured to calculate euclidean distances between the points according to the correspondence between the points and the points.

The posture updating unit 1085 is configured to optimize the initial posture according to the euclidean distance, with the objective of minimizing the global euclidean distance, to obtain a more accurate updated posture.

The pose determination unit 1086 is configured to repeat the functions of the 1083 and 1085 units, and continuously update the initial pose until the updated pose satisfies a predetermined condition (e.g., is less than a threshold).

In one embodiment, the gesture recognition apparatus further comprises:

a display module 109 (the drawing is omitted) for displaying the initial image data, the image data of the target object, the auxiliary data, the posture recognition result of the target object, or the optimization result.

In one embodiment, as shown in fig. 16, there is provided a gesture recognition apparatus including:

a three-dimensional image acquisition module 201, configured to acquire a three-dimensional image of a target object;

an auxiliary data acquisition module 202, configured to acquire auxiliary data of a target object;

a recognition model obtaining module 203, configured to obtain a gesture recognition model; and

and the recognition result output module 204 is configured to input the three-dimensional image and the auxiliary data into the gesture recognition model, and output a recognition result.

In one embodiment, as shown in fig. 17, the gesture recognition apparatus further includes:

an initial image obtaining module 205, configured to obtain an initial three-dimensional image including a foreground and a background; wherein the foreground represents the object;

a three-dimensional image extraction module 206 for extracting the foreground in the three-dimensional image; and obtaining a three-dimensional image of the target object only comprising the foreground or the single background.

In one embodiment, as shown in fig. 18, the gesture recognition apparatus further includes:

an initial image acquisition module 205' for acquiring an initial three-dimensional image comprising a foreground and a background; wherein the foreground represents the object;

an auxiliary data acquiring module 207' for acquiring auxiliary data of the target object;

a three-dimensional image extraction module 206' for extracting the foreground in the three-dimensional image in combination with auxiliary data; and obtaining a three-dimensional image of the target object only comprising the foreground or the single background.

In one embodiment, as shown in fig. 19, the gesture recognition apparatus further includes:

a recognition result obtaining module 208, configured to obtain the gesture recognition result;

and the recognition result optimizing module 209 is used for optimizing the gesture recognition result to obtain an optimized result.

Further, in one embodiment, the recognition result optimization module 209 includes:

a point cloud obtaining unit 2091, configured to obtain a reference point cloud and a point cloud of a target object;

an initial posture acquiring unit 2092, configured to acquire an initial posture of the target object;

a correspondence determining unit 2093, configured to determine, according to the initial pose, a correspondence between each point and a point on the point clouds of the reference point cloud and the target object;

the distance calculating unit 2094 is configured to calculate the euclidean distance between the point and the point according to the corresponding relationship between the point and the point.

The posture updating unit 2095 is configured to optimize the initial posture to obtain a more accurate updated posture by taking the minimum global euclidean distance as a target according to the euclidean distance.

The posture determination unit 2096 is configured to repeat the functions of the 2093-2095 unit and continuously update the initial posture until the updated posture meets a preset condition (e.g., is smaller than a certain threshold).

In one embodiment, the gesture recognition apparatus further comprises:

and the display module (the drawing is omitted) is used for displaying the initial image data, the image data of the target object, the auxiliary data, the posture recognition result of the target object or the optimization result.

In one embodiment, as shown in fig. 21, there is provided a gesture recognition training apparatus, the gesture recognition training including:

a sample obtaining module 301, configured to obtain a first training sample set;

a model obtaining module 302, configured to obtain an initial model of the gesture recognition model;

the model training module 303 is configured to train an initial model based on a training sample set to obtain a posture recognition model; the gesture recognition model is used for outputting a gesture recognition result of the target object to the input three-dimensional image.

In one embodiment, as shown in fig. 22, there is provided a gesture recognition training apparatus, the gesture recognition training including:

a sample obtaining module 301, configured to obtain a second training sample set;

a model training module 303, configured to train an initial model based on a training sample set to obtain a gesture recognition model; the gesture recognition model is used for outputting a gesture recognition result of the target object to the input three-dimensional image and the auxiliary data.

For the above-mentioned respective gesture recognition devices and gesture recognition training devices and specific limitations, reference may be made to the above limitations on the gesture recognition methods and gesture recognition training methods, which are not described in detail herein. All or part of each module in each gesture recognition device and the gesture recognition training device can be realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, as shown in fig. 23, a gesture recognition system is provided that includes an image sensor 400 and a control device 500.

An image sensor 400 for acquiring a three-dimensional image of a target object;

a control device 500 for acquiring a three-dimensional image of a target object; acquiring a gesture recognition model; inputting the three-dimensional image into the gesture recognition model, and outputting a gesture recognition result of the target object; or

An image sensor 400 for acquiring a two-dimensional image of a target object;

a control device 500 for acquiring a two-dimensional image of the target object and generating a three-dimensional image of the target object from the two-dimensional image; acquiring a three-dimensional image of a target object; acquiring a gesture recognition model; inputting the three-dimensional image into the gesture recognition model, and outputting a gesture recognition result of the target object; or

The system comprises a three-dimensional image acquisition unit, a three-dimensional image acquisition unit and a three-dimensional image generation unit, wherein the three-dimensional image acquisition unit is used for acquiring a two-dimensional image of a target object and generating a three-dimensional image of the target object according to the two-dimensional image; acquiring a three-dimensional image of a target object; acquiring auxiliary data of a target object; acquiring a gesture recognition model; inputting the three-dimensional image and the auxiliary data into the gesture recognition model, and outputting a recognition result

The control device 500 may be a Programmable Logic Controller (PLC), a Field-Programmable Gate Array (FPGA), a Computer (Personal Computer, PC), an Industrial Personal Computer (IPC), a server, or the like. The control device generates program instructions according to a pre-fixed program in combination with manually input information, parameters, data collected by an external image sensor, and the like.

For the specific limitations of the above control device, reference may be made to the limitations of the gesture recognition method above, and details are not repeated here.

It should be noted that the above-mentioned control devices, projectors and/or sensors may be real control devices, projectors and/or sensors in a real environment, or may be virtual control devices, projectors and/or sensors in a simulation platform, and the effect of connecting the real control devices, projectors and/or sensors is achieved through the simulation environment. The control device which completes behavior training depending on the virtual environment is transplanted to the real environment to control or retrain the real control device, the projector and/or the sensor, so that the resources and time in the training process can be saved.

In one embodiment, as shown in fig. 25, there is provided a computer device comprising a memory storing a computer program and a processor implementing the gesture recognition described above when the processor executes the computer program; and/or steps of a gesture recognition training method.

In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored, which computer program, when executed by a processor, implements the above-described gesture recognition; and/or steps of a gesture recognition training method.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.

The terms "first," "second," "third," "S101," "S102," "S103," and the like in the claims and in the description and drawings above, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein. Furthermore, the terms "comprises," "comprising," and any variations thereof, are intended to cover non-exclusive inclusions. For example: a process, method, system, article, or robot that comprises a list of steps or modules is not necessarily limited to those steps or modules explicitly listed, but includes other steps or modules not explicitly listed or inherent to such process, method, system, article, or robot.

It should be noted that the embodiments described in the specification are preferred embodiments, and the structures and modules involved are not necessarily essential to the invention, as will be understood by those skilled in the art.

The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A gesture recognition method, characterized in that the gesture recognition method comprises:

acquiring a three-dimensional image of a target object;

acquiring a gesture recognition model; and

2. The gesture recognition method according to claim 1, further comprising, after outputting the gesture recognition result of the target object:

acquiring the gesture recognition result;

and optimizing the gesture recognition result to obtain an optimized result.

3. The gesture recognition method according to claim 1 or 2, wherein the acquiring of the three-dimensional image of the target object further comprises:

acquiring auxiliary data of a target object;

4. A gesture recognition method, characterized in that the gesture recognition method comprises:

acquiring a three-dimensional image of a target object;

acquiring auxiliary data of a target object;

acquiring a gesture recognition model; and

5. The gesture recognition method according to claim 4, further comprising, after outputting the gesture recognition result of the target object:

acquiring the gesture recognition result;

and optimizing the gesture recognition result to obtain an optimized result.

6. The gesture recognition method according to claim 4 or 5, wherein the acquiring of the three-dimensional image of the target object further comprises:

acquiring auxiliary data of a target object;

7. The gesture recognition training method is characterized by comprising

Acquiring a first training sample set;

acquiring an initial model of the gesture recognition model;

Acquiring a second training sample set;

acquiring an initial model of the gesture recognition model;

8. An attitude recognition apparatus, characterized in that the apparatus comprises:

9. The gesture recognition apparatus according to claim 8, further comprising:

10. The gesture recognition apparatus according to claim 8 or 9, wherein the acquiring of the three-dimensional image of the target object further comprises, before:

11. A gesture recognition apparatus, characterized in that the gesture recognition apparatus comprises:

12. The gesture recognition apparatus according to claim 11, further comprising:

13. The posture recognition device according to claim 11 or 12, characterized by further comprising:

14. A posture-recognition training device, characterized by comprising:

15. A gesture recognition system, characterized in that the gesture recognition system comprises an image sensor and a control device;

16. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the gesture recognition method of any one of claims 1-6 when executing the computer program; and/or the gesture recognition training method of claim 7.

17. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the gesture recognition method according to any one of claims 1 to 6; and/or the gesture recognition training method of claim 7.