CN114972618A

CN114972618A - Three-dimensional key point acquisition system, method and device and storage medium

Info

Publication number: CN114972618A
Application number: CN202110197744.7A
Authority: CN
Inventors: 胡慧
Original assignee: Guangzhou Shiyuan Electronics Thecnology Co Ltd; Guangzhou Shiyuan Artificial Intelligence Innovation Research Institute Co Ltd
Current assignee: Guangzhou Shiyuan Electronics Thecnology Co Ltd; Guangzhou Shiyuan Artificial Intelligence Innovation Research Institute Co Ltd
Priority date: 2021-02-22
Filing date: 2021-02-22
Publication date: 2022-08-30

Abstract

The embodiment of the application provides a three-dimensional key point acquisition system, a method, a device and a storage medium, wherein the system comprises: the device comprises an arithmetic device and at least one shooting device, wherein the shooting device comprises a binocular camera and a gray camera which are composed of two infrared cameras; the device comprises a shooting device, a target object and a binocular camera, wherein the shooting device is used for shooting the target object to obtain a gray image and a light spot image, the surface of a key point of the target object is provided with a reflective material, the gray image is obtained by shooting through the gray camera, each infrared camera in the binocular camera corresponds to one light spot image, and each light spot in the light spot image corresponds to one key point; the computing device is used for determining a first three-dimensional coordinate of each key point according to the light spot image, and the first three-dimensional coordinate is positioned in a first coordinate system; and determining a second three-dimensional coordinate corresponding to each key point in the gray level image according to the first three-dimensional coordinate, wherein the second three-dimensional coordinate is positioned in a second coordinate system. The system can solve the technical problems that the three-dimensional key point acquisition efficiency is low and a large amount of labor cost is consumed in the related technology.

Description

Three-dimensional key point acquisition system, method and device and storage medium

Technical Field

The embodiment of the application relates to the technical field of three-dimensional measurement, in particular to a three-dimensional key point acquisition system, a method, a device and a storage medium.

Background

Deep learning refers to learning the intrinsic rules and representation levels of sample data, and the final goal is to make a machine have the ability of analyzing and learning like a human. The deep learning model is constructed by adopting a deep learning algorithm, and a large amount of training data cannot be obtained in the training process of the deep learning model. The high-quality training data can enable the deep learning model to have a better training effect.

The key points in the collected training data are technical means which are often involved in the construction of the training data, in particular to the collection of three-dimensional key points. In some technologies, each key point in the training data is manually marked, then a computer and other equipment calculate the two-dimensional coordinates of each key point, and then a series of transformations are performed on the two-dimensional coordinates to obtain the three-dimensional coordinates of the key points, so as to realize the collection of the three-dimensional key points. In addition, in order to ensure the quality of the annotation, more than two persons are required to label the key points in the same training data, so as to achieve the purpose of verifying the accuracy of the labeling. Therefore, a large amount of labor cost is consumed, and the acquisition efficiency of the three-dimensional key points is reduced.

Disclosure of Invention

The embodiment of the application provides a three-dimensional key point acquisition system, a method, a device and a storage medium, which aim to solve the technical problems of low three-dimensional key point acquisition efficiency and large labor cost consumption in the related technology.

In a first aspect, an embodiment of the present application provides a three-dimensional keypoint acquisition system, including: the device comprises a computing device and at least one shooting device, wherein the shooting device comprises a binocular camera and a gray camera, and the binocular camera consists of two infrared cameras;

the shooting device is used for shooting a target object and obtaining a gray level image and a light spot image, a reflective material is arranged on the surface of a key point of the target object, the gray level image is obtained by shooting through the gray level camera, the light spot image is obtained by shooting through the binocular camera, each infrared camera in the binocular camera corresponds to one light spot image, the light spot image comprises a plurality of light spots, and each light spot corresponds to one key point;

the arithmetic device is used for acquiring the gray level image and the light spot image shot by the shooting device; determining a first three-dimensional coordinate of each key point according to the light spot image, wherein the first three-dimensional coordinate is located in a first coordinate system, and the first coordinate system is a three-dimensional coordinate system corresponding to a binocular camera in the shooting device; and determining a second three-dimensional coordinate corresponding to each key point in the gray-scale image according to the first three-dimensional coordinate, wherein the second three-dimensional coordinate is located in a second coordinate system, and the second coordinate system is a three-dimensional coordinate system corresponding to a gray-scale camera in the shooting device.

In a second aspect, an embodiment of the present application further provides a three-dimensional keypoint acquisition method, including:

the method comprises the steps that a gray level image and a light spot image which are obtained after a target object is shot by a shooting device are obtained, the shooting device comprises at least one binocular camera and at least one gray level camera, the binocular camera consists of two infrared cameras, a light-reflecting material is arranged on the surface of a key point of the target object, the gray level image is obtained through shooting by the gray level camera, the light spot image is obtained through shooting by the binocular camera, each infrared camera in the binocular camera corresponds to one light spot image, the light spot image comprises a plurality of light spots, and each light spot corresponds to one three-dimensional key point;

determining a first three-dimensional coordinate of each key point according to the light spot image, wherein the first three-dimensional coordinate is located in a first coordinate system, and the first coordinate system is a coordinate system corresponding to a binocular camera in the shooting device;

and determining a second three-dimensional coordinate corresponding to each key point in the gray level image according to the first three-dimensional coordinate, wherein the second three-dimensional coordinate is located in a second coordinate system, and the second coordinate system is a three-dimensional coordinate system corresponding to a gray level camera in the shooting device.

In a third aspect, an embodiment of the present application further provides a three-dimensional keypoint acquisition apparatus, including:

the device comprises an image acquisition module, a light spot acquisition module and a light source module, wherein the image acquisition module is used for acquiring a gray level image and a light spot image which are acquired after a target object is shot by a shooting device, the shooting device comprises a binocular camera and a gray level camera, the binocular camera consists of two infrared cameras, a reflecting material is arranged on the surface of a key point of the target object, the gray level image is shot by the gray level camera, the light spot image is shot by the binocular camera, each infrared camera in the binocular camera corresponds to one light spot image, the light spot image comprises a plurality of light spots, and each light spot corresponds to one key point;

the first coordinate determination module is used for determining a first three-dimensional coordinate of each key point according to the light spot image, the first three-dimensional coordinate is located in a first coordinate system, and the first coordinate system is a three-dimensional coordinate system corresponding to a binocular camera in the shooting device;

and the second coordinate determination module is used for determining a second three-dimensional coordinate corresponding to each key point in the gray level image according to the first three-dimensional coordinate, the second three-dimensional coordinate is positioned in a second coordinate system, and the second coordinate system is a three-dimensional coordinate system corresponding to a gray level camera in the shooting device.

In a fourth aspect, the present application further provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the three-dimensional keypoint acquisition method according to the first aspect.

According to the three-dimensional key point acquisition system, the method, the device and the storage medium, the gray level image of the target object is shot through the gray level camera in the shooting device, the light spot image of the target object is shot through the binocular camera in the shooting device, wherein the light reflection material is arranged on the surface of the key point of the target object, then the light spot image is processed through the operation device to obtain the first three-dimensional coordinate of each key point, and the second three-dimensional coordinate corresponding to each key point in the gray level image is determined according to the first three-dimensional coordinate, so that the three-dimensional key points are acquired, the technical problems that the three-dimensional key point acquisition efficiency is low and a large amount of labor cost is consumed in some technologies are solved, and the automatic acquisition of the three-dimensional key points is realized. Light spot images are obtained through the reflective material and the infrared camera, and then automatic calibration of key points is achieved through the light spots. The gray level image of the three-dimensional coordinates of the key points is used as training data, so that the problem of spot overfitting caused by using the spot image as the training data can be avoided, and the quality of the training data is ensured. Furthermore, the three-dimensional coordinate is simple in calculation mode, convenient to apply and popularize, simple in structure and free of a large number of cameras.

Drawings

Fig. 1 is an RGB image provided in the related art;

FIG. 2 is a depth image provided by the related art;

FIG. 3 is a high definition image provided by the related art;

fig. 4 is a schematic structural diagram of a three-dimensional keypoint acquisition system according to an embodiment of the present disclosure;

fig. 5 is a light spot image provided in an embodiment of the present application;

fig. 6 is a grayscale image provided in an embodiment of the present application;

FIG. 7 is another grayscale image provided in accordance with an embodiment of the present application;

fig. 8 is a schematic structural diagram of another three-dimensional keypoint acquisition system provided in the embodiment of the present application;

fig. 9 is a flowchart of a three-dimensional keypoint acquisition method according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of a three-dimensional keypoint acquisition device provided in an embodiment of the present application.

Detailed Description

The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are for purposes of explanation, not limitation, of the application. It should be further noted that, for the convenience of description, only some of the structures related to the present application are shown in the drawings, not all of the structures.

It is noted that, in this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action or object from another entity or action or object without necessarily requiring or implying any actual such relationship or order between such entities or actions or objects. For example, "first" and "second" of the first three-dimensional coordinate and the second three-dimensional coordinate are used to distinguish two different three-dimensional coordinates.

Taking key points of human hands as an example, in some related technologies, a magnetic sensor is attached to the key point position of the hand, an RGB camera and a TOF camera are used for shooting the hand to obtain an RGB image and a depth image, and then key point matching is performed through the two images to determine a three-dimensional coordinate of the key point. Among them, the RGB camera can obtain images containing various colors by changing three color channels of red (R), green (G) and blue (B) and superimposing them on each other. The TOF camera is a three-dimensional model that measures depth so that a captured depth image is stereoscopic. For example, fig. 1 is an RGB image provided by the related art, and fig. 2 is a depth image provided by the related art. Fig. 1 and 2 are both images of a human hand, and the posture of the hand is the same in both images. As can be seen from fig. 1 and 2, both images show magnetic sensors, the three-dimensional coordinates of the key points can be determined by the depth images, and the three-dimensional coordinates of the corresponding key points in the RGB images are marked to obtain training data, at this time, the obvious magnetic sensors appear in the training data, which reduces the quality of the training data.

In some related technologies, a spherical laboratory can be further built, the laboratory is formed by splicing a plurality of pentagons or hexagons, and a high-definition camera is placed at the center of each pentagon or hexagon to shoot in an all-round mode. At this time, when the key points of the image shot by a high-definition camera are collected, a coarse key point detection algorithm needs to be trained to roughly label the key points in the corresponding image, and then the rough label is iterated step by step to obtain a fine label. For example, fig. 3 is a high-definition image provided by the related art, and the high-definition image shown in fig. 3 can be obtained by performing the above processing on an image taken by a high-definition camera in a spherical laboratory, and the high-definition image marks each key point of a hand, but a large number of high-definition cameras are required and the calculation process is complicated.

In summary, the embodiment of the present application provides a three-dimensional keypoint acquisition system to realize automatic acquisition of keypoints, and a large number of cameras are not required and the quality of training data is ensured.

The embodiment of the application provides a three-dimensional key point acquisition system, which is used for acquiring key points of a target object, wherein the target object is a hand of a human being. In this case, the hand key points are skeletal key points of the hand. For example, the key points of the hand include 26 skeletal key points such as each fingertip and each phalanx joint. It should be noted that, in practical applications, the target object may be of another type, and the key points included in the target object may be set according to practical situations.

Furthermore, three-dimensional key points are acquired through a three-dimensional key point acquisition system, wherein the three-dimensional key points refer to key points for determining three-dimensional coordinates. In an embodiment, the acquisition process specifically identifies key points and three-dimensional coordinates of the key points in an image containing the target object.

In one embodiment, a three-dimensional keypoint acquisition system comprises: the device comprises an arithmetic device and at least one shooting device, wherein the shooting device comprises a binocular camera and a gray camera, and the binocular camera consists of two infrared cameras; the device comprises a shooting device, a target object and a control device, wherein the shooting device is used for shooting the target object and obtaining a gray level image and a light spot image; the arithmetic device is used for acquiring the gray level image and the light spot image shot by the shooting device; determining a first three-dimensional coordinate of each key point according to the light spot image, wherein the first three-dimensional coordinate is positioned in a first coordinate system, and the first coordinate system is a three-dimensional coordinate system corresponding to a binocular camera in the shooting device; and determining a second three-dimensional coordinate corresponding to each key point in the gray level image according to the first three-dimensional coordinate, wherein the second three-dimensional coordinate is located in a second coordinate system, and the second coordinate system is a three-dimensional coordinate system corresponding to a gray level camera in the shooting device.

Fig. 4 is a schematic structural diagram of a three-dimensional keypoint capturing system according to an embodiment of the present application, and referring to fig. 4, the three-dimensional keypoint capturing system includes a shooting device 11 and an arithmetic device 12, it can be understood that, in fig. 4, one shooting device 11 is taken as an example, and in practical applications, a plurality of shooting devices may be provided.

Specifically, the photographing device 11 includes a binocular camera 111 and a grayscale camera 112, wherein the binocular camera 111 simulates two eyes of a human through the two cameras to achieve a binocular photographing effect. In one embodiment, the binocular camera is composed of two infrared cameras (two infrared cameras in fig. 4 are composed of the binocular camera 111), wherein the infrared cameras are sensitive to capture infrared light, and the two infrared cameras can correspond to the left and right eyes of a human respectively. Optionally, the two infrared cameras use the same focal length, and the distance between the two infrared cameras can be set according to actual conditions. In one embodiment, grayscale camera 112 refers to a camera that takes grayscale images, which cannot sensitively capture infrared light. For example, the layout relationship between the grayscale camera 112 and the binocular camera 111 can be set according to actual situations, for example, the grayscale camera 112 and the binocular camera 111 are located on the same plane (the lenses of the three cameras are located on the same horizontal line during shooting), and in this case, the grayscale camera 112 can be located on the left side, the right side, or between the two infrared cameras of the binocular camera 111. Fig. 4 illustrates an example in which the grayscale camera 112 is located between two infrared cameras of the binocular camera 111.

Further, the gray camera 112 and the binocular camera 111 photograph the target object. The surface of the key point of the target object is provided with a reflective material, the reflective material is gray under normal conditions, and can reflect infrared light, namely the infrared reflective material, wherein the specific material embodiment of the reflective material is not limited. For example, the reflection collectors may be attached to the surface of the key points, or may be disposed on the surface of the key points in other ways. It can be understood that the grayscale camera and the binocular camera may shoot simultaneously or may shoot non-simultaneously (e.g., sequentially), which is not limited in the embodiment, and the target object remains in a static state in one shooting process, that is, the posture of the target object is not changed when each camera shoots. During multiple shooting processes, the pose of the target object can be changed.

When the binocular camera 111 photographs a target object, both the infrared cameras photograph the target object. In the embodiment, the image captured by the binocular camera 111 is recorded as a light spot image, and at this time, each infrared camera corresponds to one light spot image. Taking the shooting process of an infrared camera as an example, when the infrared camera shoots, the infrared light reflected by the reflective material at the key point is captured by the infrared camera to form a light spot in the light spot image. At this time, each light spot in the light spot image corresponds to a key point of the target object. For example, fig. 5 is a spot image provided in an embodiment of the present application. Referring to fig. 5, the target object is a hand of a human being, and the light spot image includes light spots corresponding to 13 key points. The position of each key point in the light spot image can be clear through the light spot. In one embodiment, the surface of each keypoint in the target object is provided with a reflective material having a different shape. In order to distinguish different key points in the target object, in the embodiment, the reflecting materials with different shapes are pasted on the different key points, so that the shapes of all light spots in the light spot image are different, and the positions of all joint points in the light spot image are accurately found according to the shapes of all the light spots.

When the grayscale camera 112 captures a target image, the obtained image is a grayscale image, that is, each pixel in the image has only one sampling color, and each color can be distinguished by different grayscales. In an embodiment, the light reflecting material is normally gray, and therefore, the light reflecting material in the gray scale image is not clearly displayed. For example, fig. 6 is a grayscale image provided in an embodiment of the present application. Referring to fig. 6, which is a gray image obtained by photographing a hand of a human face with a gray camera, light spots shown in fig. 5 do not appear in fig. 6.

In one embodiment, the computing device 12 obtains the light spot image and the gray scale image captured by the capturing device 11 and processes the light spot image and the gray scale image to acquire the key points. The computing device 12 is a device with data processing capability, for example, the computing device 12 may be a computer, a mobile phone, or the like. The computing device 12 may include a memory and at least one processor, wherein the memory, as a computer-readable storage medium, may mainly include a program storage area and a data storage area, and the memory is used for storing software programs, computer-executable programs, and modules, such as program instructions/modules used by the computing device in the embodiments of the present application in acquiring three-dimensional key points. The processor executes various functional applications and data processing of the arithmetic device by running software programs, instructions and modules stored in the memory, so as to realize the acquisition of three-dimensional key points. In addition, the computing device may further include a communication module for data communication, an input module for receiving a key-in signal, an output module for display, and the like.

Specifically, the computing device 12 may be connected to the camera 11 through a data line to obtain a corresponding image. Alternatively, the corresponding image is obtained in other manners, such as obtaining the corresponding image from the camera 11 through a wireless connection manner, such as bluetooth or WiFi, or reading the memory card of each camera in the camera 11 to obtain the corresponding image. Optionally, in one shooting process, each camera of the shooting device 11 may shoot a plurality of grayscale images and a plurality of spot images, and then the computing device 12 selects an image with better quality from the images shot by each camera as an image for subsequent processing. Alternatively, in one shooting process, each camera of the shooting device 11 only shoots one image for the processing of the computing device 12. When the computing device 12 processes the image of one imaging device 11, it is more specifically to process one grayscale image captured by the grayscale camera and two flare images captured by the two infrared cameras in the imaging device 11.

In the embodiment, the description will be given taking an example in which the arithmetic device 12 processes an image of one imaging device 11. Specifically, the grayscale camera and the binocular camera in the photographing device 11 are respectively calibrated to respectively determine the internal reference matrix and the external reference matrix of each camera and the relative position relationship between the grayscale camera and the binocular camera. The specific implementation of the camera calibration is not limited. Further, the internal parameter is a parameter related to the characteristics of the camera itself, such as the focal length of the camera, the pixel size, and the like. The internal reference matrix means that internal references are represented in a matrix mode, the external references are parameters of the camera in a world coordinate system, such as the position, the rotation direction and the like of the camera, and the external reference matrix means that the external references are represented in a matrix mode. The relative positional relationship of the binocular camera and the grayscale camera may be determined by an extrinsic parameter matrix. After calibration is completed, the computing device 12 may obtain the internal reference matrix and the external reference matrix of each camera and the relative position relationship between the grayscale camera and the binocular camera.

Illustratively, the arithmetic device 12 divides each light spot in the light spot image by an image division algorithm. The embodiment of the image segmentation algorithm is not limited herein. It can be understood that after each light spot is segmented from the light spot image, the position of the corresponding key point in the light spot image can be determined, that is, the two-dimensional coordinates of the key point in the light spot image are determined. The two-dimensional coordinates are pixel coordinates and are used for reflecting the positions of the pixels displaying the key points in the light spot image. Optionally, a pixel area occupied by the light spot in the light spot image is used as a two-dimensional coordinate of the key point, and at this time, the two-dimensional coordinate of the key point is composed of two-dimensional coordinates of each pixel in the pixel area. Optionally, a central point of a pixel area occupied by the light spot in the light spot image is used as a two-dimensional coordinate of the key point, and at this time, the two-dimensional coordinate of the key point is the two-dimensional coordinate of the central point.

Further, after the two spot images are divided by the operation device 12, two-dimensional coordinates of each key point in the two spot images can be obtained, and then, the three-dimensional coordinates of the key point are determined according to the two-dimensional coordinates of the same key point in different spot images in a binocular positioning manner. In the embodiment, the three-dimensional coordinate system is recorded as a first coordinate system, wherein the first coordinate system is a three-dimensional coordinate system established by taking one infrared camera in the binocular cameras as an origin.

Specifically, the computing device 12 determines two-dimensional coordinates of the same key point in two spot images. Optionally, when the shapes of the light spots of different key points are different, the light spots belonging to the same key point can be determined in the two light spot images according to the shapes of the light spots, and then two-dimensional coordinates of the key point are obtained. Or two-dimensional coordinates of the key points are obtained according to the arrangement relationship, wherein when the binocular camera is used for shooting the light spot images, the key points in the two light spot images have the same arrangement relationship, so that the light spots of the same key point can be determined through the arrangement relationship, and the two-dimensional coordinates are obtained. It is understood that different codes are used for different keypoints to record the different keypoints through the codes.

After the two-dimensional coordinates of the same key point in the two light spot images are obtained, since the operation device 12 already determines the internal reference matrix and the external reference matrix of the binocular camera, the operation device 12 can obtain the first three-dimensional coordinate of the key point in the first coordinate system by using a geometric mode according to the two-dimensional coordinates of the same key point in the two light spot images. When the first three-dimensional coordinate of the key point is determined in a geometric manner, the calculation can be performed through the following formula:

wherein (x, y, z) is the first three-dimensional coordinate of the key point, B is the base line distance of the binocular camera, f is the focal length of the binocular camera, and (u) _l ,v _l ) Two-dimensional coordinates, u, of key points in an image of a spot _r Is the x-axis coordinate of the two-dimensional coordinate of the key point in the other spot image. u. of _l And v _l The infrared camera corresponding to the light spot image is the infrared camera for constructing the first coordinate system. For example, the infrared camera located on the left side at the time of photographing is the origin of the first coordinate system, and then u _l And v _l X-axis coordinate and y-axis coordinate, u, of corresponding key point in the light spot image corresponding to the left infrared camera _r And the coordinate of the X-axis in the light spot image corresponding to the right infrared camera is the same key point. In the examples, the base line distanceIs the physical distance (actual distance) between the two infrared camera cameras. In an embodiment, the focal lengths of the two infrared cameras are the same. The first three-dimensional coordinate of each key point can be calculated through the formula.

Further, the operating device 12 determines the three-dimensional coordinates of each key point in the grayscale image according to the first three-dimensional coordinates of each key point, and in the embodiment, the three-dimensional coordinates are recorded as the second three-dimensional coordinates. The second three-dimensional coordinate system is located in a second coordinate system, and the second coordinate system is a three-dimensional coordinate system corresponding to the grayscale camera, that is, the second coordinate system is a three-dimensional coordinate system constructed by taking the grayscale camera as an origin. It is understood that, according to the relative position relationship between the grayscale camera and the binocular camera and the corresponding external parameters, a transformation matrix used when the first coordinate system is transformed to the second coordinate system may be determined, and in the embodiment, the transformation matrix is referred to as the first coordinate transformation matrix, and it is understood that the transformation matrix may include a translation matrix (for translating the three-dimensional coordinates) and/or a rotation matrix (for rotating the three-dimensional coordinates), and the embodiment of the calculation means used is not limited. And then, performing coordinate transformation on the first three-dimensional coordinates of the key points according to the first coordinate transformation matrix to obtain second three-dimensional coordinates of the key points, namely determining the second three-dimensional coordinates of the key points in the gray level image, and further realizing the acquisition of the key points.

The method comprises the steps of shooting a gray image of a target object through a gray camera in a shooting device, shooting a light spot image of the target object through a binocular camera in the shooting device, wherein a reflective material is arranged on the surface of a key point of the target object, processing the light spot image through an arithmetic device to obtain a first three-dimensional coordinate of each key point, and determining a second three-dimensional coordinate corresponding to each key point in the gray image according to the first three-dimensional coordinate to realize the acquisition of the three-dimensional key points. Light spot images are obtained through the reflective material and the infrared camera, and then automatic calibration of key points is achieved through the light spots. The gray level image of the three-dimensional coordinates of the key points is used as training data, so that the problem of spot overfitting caused by using the spot image as the training data can be avoided, and the quality of the training data is ensured. Furthermore, the three-dimensional coordinate calculation method is simple, application and popularization are facilitated, the structure of the shooting device is simple, and a large number of cameras are not required to be arranged.

It can be understood that when a plurality of shooting devices are arranged, the arithmetic device completes the collection of key points in each gray-scale image after the gray-scale image and the light spot image shot by each shooting device are processed. And then, integrating the second three-dimensional coordinates of each key point in different gray level images into one gray level image. At this time, when a plurality of shooting devices are provided in the embodiment, each shooting device corresponds to one gray image, one gray image of the plurality of gray images is a first gray image, and the remaining gray images are second gray images; and the operation device is also used for converting the second three-dimensional coordinates corresponding to the key points in the second gray scale image into a second coordinate system corresponding to the first gray scale image so as to obtain third three-dimensional coordinates of the key points in the first gray scale image.

Specifically, the plurality of photographing devices photograph the target object from different viewing angles, so as to avoid a situation that one photographing device photographs the target object while blocking a key point, which may be understood as a self-blocking situation. For example, fig. 7 is another grayscale image provided in the embodiments of the present application. Referring to fig. 7, the target object is a human hand, and at this time, only part of key points of the hand are displayed in the grayscale image, and another part of key points are blocked due to the hand motion, such as part of key points of ring fingers and little fingers in fig. 7. At this time, one photographing device cannot photograph all the key points of the target object, and thus, a plurality of photographing devices may be used to photograph from different viewing angles to ensure that as many key points as possible are obtained. In the embodiment, after the target object is shot by the plurality of shooting devices, all key points of the target object can be obtained. Optionally, the shooting positions of the shooting devices may be manually set according to actual requirements, for example, the shooting devices are located on the same horizontal plane and surround the target device for shooting.

Further, after determining the second three-dimensional coordinates of each key point in each gray scale image, the computing device integrates each second three-dimensional coordinate into one gray scale image to acquire the second three-dimensional coordinates of all key points in one gray scale image. In an embodiment, the computing device selects one gray image to be recorded as a first gray image, the first gray image is a gray image integrating the second three-dimensional coordinates of the key points, and the remaining gray images are recorded as second gray images. For example, the operation device randomly selects one gray image from the gray images as the first gray image, or the operation device selects a gray image with the object at the front view angle as the first gray image, or manually selects one gray image through the operation device as the first gray image. After the first gray scale image is selected, the operation device converts the second three-dimensional coordinates of each key point in the second gray scale image from the second coordinate system of the operation device to the second coordinate system corresponding to the first gray scale image. Specifically, the operation device determines a conversion matrix used when the second coordinate system of the second gray scale image is converted into the second coordinate system of the first gray scale image. Optionally, when there are multiple second grayscale images, the second three-dimensional coordinates of the key points in each second grayscale image are converted to obtain third three-dimensional coordinates. It can be understood that in practical applications, the same key point may appear in multiple gray scale images, and after coordinate conversion is performed, multiple three-dimensional coordinates may appear in the same key point in the first gray scale image. At this time, only one of the three-dimensional coordinates may be retained in the first grayscale image as the three-dimensional coordinate of the key point. For example, different key points have different codes, and therefore, whether the same key point is included can be determined by the code corresponding to each three-dimensional coordinate in the first gray-scale image, and only one of the three-dimensional coordinates is retained when the same key point is included. For another example, after the second three-dimensional coordinates of each key point are converted, the distance between the third three-dimensional coordinates or the second three-dimensional coordinates of the same key point in the first grayscale image is very short, so if three-dimensional coordinates with very short distances (i.e., three-dimensional coordinates smaller than the distance threshold) exist in the third three-dimensional coordinates and the second three-dimensional coordinates in the first grayscale image, it can be determined that the three-dimensional coordinates belong to the same key point, and then the three-dimensional coordinates are determined to belong to the same key point.

Only one of the three-dimensional coordinates is retained

It can be understood that even if a certain key point is not displayed in the first gray scale image, its corresponding third three-dimensional coordinate is marked in the first gray scale image.

On the basis of the above embodiment, there are three shooting devices, and the three shooting devices are located on the same horizontal plane and shoot around the target object.

Fig. 8 is a schematic structural diagram of another three-dimensional keypoint acquisition system according to an embodiment of the present application, and an arithmetic device is not shown in fig. 8. Referring to fig. 8, it includes three cameras, which are respectively referred to as a camera 21, a camera 22 and a camera 23, each of which includes two infrared cameras to constitute a binocular camera, and a grayscale camera located in the middle of the two infrared cameras. Further, the three cameras are located on the same plane, and in fig. 8, the three cameras are all located on the plane where the triangle is located. The three shooting devices shoot around the target object means that the three shooting devices shoot the target object from different time so as to ensure that all key points of the target object can be obtained by the three shooting devices. It should be noted that the layout shown in fig. 8 is an optional layout, in which the image capturing device 21 captures one plane of the target object, and if the image capturing device 21 captures the front of the target object, the image capturing device 22 and the image capturing device 23 capture two opposite sides of the target object, and at this time, all the key points can be captured by three image capturing devices. It is understood that other layouts of the three cameras may be used, for example, the connecting lines of the three cameras form an equilateral triangle, and the three cameras are respectively located at the three vertices of the equilateral triangle. Generally speaking, when three shooting devices are arranged, the requirement of collecting all key points is met.

Above-mentioned, through setting up a plurality of shooting devices, can avoid some key points to be sheltered from the condition that can't gather this part key point after, guarantee to shoot whole key points, later, integrate each second three-dimensional coordinate of different grey scale images, can obtain the grey scale image of guaranteeing whole key point three-dimensional coordinate, improved the quality of grey scale image.

The embodiment of the application also provides a three-dimensional key point acquisition method. Specifically, fig. 9 is a flowchart of a three-dimensional keypoint acquisition method according to an embodiment of the present application. Referring to fig. 9, the three-dimensional keypoint acquisition method specifically includes:

step 310, obtaining a gray level image and a light spot image obtained after a target object is shot by a shooting device, wherein the shooting device comprises at least one binocular camera and at least one gray level camera, the binocular camera is composed of two infrared cameras, a reflective material is arranged on the surface of a key point of the target object, the gray level image is obtained by shooting through the gray level camera, the light spot image is obtained by shooting through the binocular camera, each infrared camera in the binocular camera corresponds to one light spot image, the light spot image comprises a plurality of light spots, and each light spot corresponds to one key point.

Wherein the grayscale camera is located between the two infrared cameras. The shapes of the reflecting materials arranged on the surfaces of the key points in the target object are different

And step 320, determining a first three-dimensional coordinate of each key point according to the light spot image, wherein the first three-dimensional coordinate is located in a first coordinate system, and the first coordinate system is a three-dimensional coordinate system corresponding to a binocular camera in the shooting device.

In one embodiment, the step specifically includes steps 321 to 323:

and step 321, determining two-dimensional coordinates of each light spot in the light spot image.

In the embodiment, two infrared cameras of a binocular camera take one spot image each for example. Specifically, two-dimensional coordinates of each light spot in each light spot image are identified, and the two-dimensional coordinates are pixel coordinates. For example, since the pixel area where the light spot is located in the light spot image is clearly distinguished from other areas in the light spot image, each light spot can be identified in the light spot image, and then the two-dimensional coordinates of each light spot in the light spot image can be determined. The embodiment of the technical means for identifying the light spots is not limited, for example, each light spot is segmented in the light spot image by using an image segmentation algorithm.

And 322, taking the two-dimensional coordinates of the light spots as the two-dimensional coordinates of the corresponding key points.

It can be understood that each light spot corresponds to a key point, and therefore, the two-dimensional coordinates of the light spot can be taken as the two-dimensional coordinates of the corresponding key point in the light spot image. Optionally, the surfaces of different key points in the target object are provided with reflective materials of different shapes, the shape of the reflective material corresponding to the different key points is pre-recorded in the operation device, and codes corresponding to the different key points are recorded, and then the shape of each light spot in the light spot image is identified to code each light spot, so as to determine the key point corresponding to the light spot. Optionally, in different spot images, spots with the same shape have the same code, that is, the same key point uses the same code. Then, the two-dimensional coordinates of the light spot are taken as the two-dimensional coordinates of the key point.

Step 323, determining a first three-dimensional coordinate of the key point according to the base line distance of the binocular camera, the focal length of the binocular camera and the two-dimensional coordinate of the same key point in the two light spot images, wherein the two light spot images are obtained by shooting by two infrared cameras of the binocular camera respectively.

For example, since the light spots of the same key point have the same code, the two-dimensional coordinates of the same key point in the two light spot images can be determined through the code. And then, determining the first three-dimensional coordinates of the key points in a geometric mode by combining the base line distance and the focal length of the binocular camera.

In one embodiment, the step specifically comprises:

by using

A first three-dimensional coordinate of the keypoint is determined,

wherein (x, y, z) is the first three-dimensional coordinate of the key point, B is the base line distance of the binocular camera, f is the focal length of the binocular camera, and (u) _l ,v _l ) Two-dimensional coordinates, u, of key points in an image of a spot _r Is the x-axis coordinate of the two-dimensional coordinate of the key point in the other spot image.

And 330, determining a second three-dimensional coordinate corresponding to each key point in the gray-scale image according to the first three-dimensional coordinate, wherein the second three-dimensional coordinate is located in a second coordinate system, and the second coordinate system is a three-dimensional coordinate system corresponding to the gray-scale camera in the shooting device.

In one embodiment, the step specifically includes steps 331-332:

step 331, a first coordinate transformation matrix between the first coordinate system and the second coordinate system is obtained.

For example, after performing operations such as translation and/or rotation on a coordinate axis in the first coordinate system, the coordinate axis is converted into a corresponding coordinate axis in the second coordinate system, and at this time, the matrix used in the operation process is the first coordinate conversion matrix. The specific embodiment of the coordinate transformation between two coordinate systems is not limited. It can be understood that the first coordinate system and the second coordinate system belong to the same photographing device.

And 332, performing coordinate conversion on the first three-dimensional coordinate according to the first coordinate conversion matrix to obtain a second three-dimensional coordinate of the corresponding key point in the gray level image.

Illustratively, each first three-dimensional coordinate is subjected to coordinate transformation by using a first coordinate transformation matrix to obtain a second three-dimensional coordinate of the corresponding key point. For example, using the formula: p is _c2 ＝T ₁₂ *P _c1 Thus obtaining the product. Wherein, P _c1 As the first three-dimensional coordinate of the key point, T ₁₂ A first coordinate transformation matrix for transforming the first coordinate system into a second coordinate system, P _c2 Is the second three-dimensional coordinate of the keypoint. The second three-dimensional coordinates of each key point can be obtained according to the formula so as to realize the pairingAnd (5) collecting three-dimensional key points in the gray level image.

On the basis of the above embodiment, there are a plurality of shooting devices, each shooting device corresponds to one gray image, one gray image of the plurality of gray images is a first gray image, and the remaining gray images are second gray images; correspondingly, after step 330, the method further includes: and converting the second three-dimensional coordinates corresponding to each key point in the second gray scale image into a second coordinate system corresponding to the first gray scale image to obtain third three-dimensional coordinates of the key point in the first gray scale image.

Wherein, the formula can be used: p _c3 ＝T ₂₂ *P _c4 And calculating a third three-dimensional coordinate corresponding to each key point in the second gray scale image. Wherein, P _c3 Is a second three-dimensional coordinate, T, of the keypoint in a second gray scale image ₂₂ For a second coordinate transformation matrix, P, used in the transformation of the second coordinate system of the second gray scale image into the second coordinate system of the first gray scale image _c4 The third three-dimensional coordinate in the first gray scale image as the key point. And obtaining a third three-dimensional coordinate of the key point according to the formula so as to realize the acquisition of all three-dimensional key points in the first gray-scale image.

The number of the shooting devices can be three, the three shooting devices are located on the same horizontal plane, and the three shooting devices shoot around the target object.

It can be understood that the above provided three-dimensional key point acquisition method can be executed by a computing device of the three-dimensional key point acquisition system, and has corresponding functions and advantages, and reference may be made to the related description in the above key point acquisition system for technical details not disclosed in this embodiment.

Fig. 10 is a schematic structural diagram of a three-dimensional keypoint acquisition device according to an embodiment of the present application, and referring to fig. 10, the three-dimensional keypoint acquisition device includes: an image acquisition module 401, a first coordinate determination module 402 and a second coordinate determination module 403.

The image acquisition module 401 is configured to acquire a gray level image and a light spot image obtained after a target object is photographed by a photographing device, the photographing device includes at least one binocular camera and a gray level camera, the binocular camera is composed of two infrared cameras, a reflective material is arranged on the surface of a key point of the target object, the gray level image is obtained by photographing through the gray level camera, the light spot image is obtained by photographing through the binocular camera, each infrared camera in the binocular camera corresponds to one light spot image, the light spot image includes a plurality of light spots, and each light spot corresponds to one key point; the first coordinate determination module 402 is configured to determine a first three-dimensional coordinate of each key point according to the light spot image, where the first three-dimensional coordinate is located in a first coordinate system, and the first coordinate system is a three-dimensional coordinate system corresponding to a binocular camera in the photographing device; and a second coordinate determining module 403, configured to determine, according to the first three-dimensional coordinate, a second three-dimensional coordinate corresponding to each key point in the grayscale image, where the second three-dimensional coordinate is located in a second coordinate system, and the second coordinate system is a three-dimensional coordinate system corresponding to a grayscale camera in the shooting device.

On the basis of the above embodiment, the first coordinate determination module 402 includes: the first two-dimensional coordinate determination unit is used for determining the two-dimensional coordinates of each light spot in the light spot image; the second two-dimensional coordinate determination unit is used for taking the two-dimensional coordinates of the light spots as the two-dimensional coordinates of the corresponding key points; the first three-dimensional coordinate calculation unit is used for determining a first three-dimensional coordinate of the key point according to the base line distance of the binocular camera, the focal length of the binocular camera and two-dimensional coordinates of the same key point in the two light spot images, the two light spot images are obtained by shooting through two infrared cameras of the binocular camera respectively, the first three-dimensional coordinate is located in a first coordinate system, and the first coordinate system is a three-dimensional coordinate system corresponding to the binocular camera in the shooting device.

On the basis of the above embodiment, the first three-dimensional coordinate calculation unit specifically includes:

by using

Determining first three-dimensional coordinates of the keypoint,

wherein (x, y, z) is the first three-dimensional coordinate of the key point, B is the base line distance of the binocular camera, f is the focal length of the binocular camera, and (u) _l ,v _l ) In a spot image for key pointsTwo-dimensional coordinates of (u) _r Is the x-axis coordinate of the two-dimensional coordinate of the key point in the other spot image.

On the basis of the above embodiment, the second coordinate determination module 403 includes: the transformation matrix determining unit is used for acquiring a first coordinate transformation matrix between a first coordinate system and a second coordinate system; and the second three-dimensional coordinate calculation unit is used for carrying out coordinate conversion on the first three-dimensional coordinate according to the first coordinate conversion matrix so as to obtain a second three-dimensional coordinate of the corresponding key point in the gray level image, the second three-dimensional coordinate is located in a second coordinate system, and the second coordinate system is a three-dimensional coordinate system corresponding to the gray level camera in the shooting device.

On the basis of the above embodiment, the number of the shooting devices is multiple, each shooting device corresponds to one gray image, one gray image of the multiple gray images is a first gray image, and the remaining gray images are second gray images, and the three-dimensional key point acquisition device further includes: and the coordinate conversion module is used for converting the second three-dimensional coordinates corresponding to the key points in the second gray scale image into a second coordinate system corresponding to the first gray scale image after determining the second three-dimensional coordinates corresponding to the key points in the gray scale image according to the first three-dimensional coordinates, so as to obtain third three-dimensional coordinates of the key points in the first gray scale image.

On the basis of the above embodiment, the grayscale camera is located between the two infrared cameras.

On the basis of the embodiment, the number of the shooting devices is three, the three shooting devices are located on the same horizontal plane, and the three shooting devices shoot around the target object.

On the basis of the above embodiment, the target object is a hand of a human being.

On the basis of the above embodiment, the shape of the reflective material arranged on the surface of each key point in the target object is different.

The three-dimensional key point acquisition device provided by the above is integrated in the arithmetic device of the three-dimensional key point acquisition system, and is used for executing the three-dimensional key point acquisition method provided by any of the above embodiments, and has corresponding functions and beneficial effects.

It should be noted that, in the embodiment of the three-dimensional keypoint acquisition apparatus, each included unit and module are only divided according to functional logic, but are not limited to the above division, as long as corresponding functions can be implemented; in addition, specific names of the functional units are only used for distinguishing one functional unit from another, and are not used for limiting the protection scope of the application.

The embodiment of the present application further provides a storage medium containing computer-executable instructions, where the computer-executable instructions are executed by a computer processor to perform the three-dimensional keypoint collection method provided by the embodiment of the present application, and the storage medium has corresponding functions and advantages.

From the above description of the embodiments, it is obvious for those skilled in the art that the present application can be implemented by software and necessary general hardware, and certainly can be implemented by hardware, but the former is a better embodiment in many cases. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, where the computer software product may be stored in a computer-readable storage medium, such as a floppy disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, and includes several instructions to enable a computer device (which may be a personal computer, a service process, or a network device) to execute the three-dimensional keypoint collection method described in the embodiments of the present application.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, the recitation of an element by the phrase "comprising an … …" does not exclude the presence of additional like elements in the process, method, article, or apparatus that comprises the element.

It is to be noted that the foregoing is only illustrative of the presently preferred embodiments and application of the principles of the present invention. It will be understood by those skilled in the art that the present application is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the application. Therefore, although the present application has been described in more detail with reference to the above embodiments, the present application is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present application, and the scope of the present application is determined by the scope of the appended claims.

Claims

1. A three-dimensional keypoint acquisition system, comprising: the device comprises an arithmetic device and at least one shooting device, wherein the shooting device comprises a binocular camera and a gray camera, and the binocular camera consists of two infrared cameras;

the arithmetic device is used for acquiring the gray level image and the light spot image shot by the shooting device; determining a first three-dimensional coordinate of each key point according to the light spot image, wherein the first three-dimensional coordinate is located in a first coordinate system, and the first coordinate system is a three-dimensional coordinate system corresponding to a binocular camera in the shooting device; and determining a second three-dimensional coordinate corresponding to each key point in the gray level image according to the first three-dimensional coordinate, wherein the second three-dimensional coordinate is located in a second coordinate system, and the second coordinate system is a three-dimensional coordinate system corresponding to a gray level camera in the shooting device.

2. The three-dimensional keypoint acquisition system of claim 1 wherein said grayscale camera is located between two of said infrared cameras.

3. The three-dimensional key point acquisition system according to claim 1, wherein the number of the shooting devices is multiple, each shooting device corresponds to one gray-scale image, one gray-scale image in the multiple gray-scale images is a first gray-scale image, and the rest gray-scale images are second gray-scale images;

the operation device is further configured to convert the second three-dimensional coordinates corresponding to each key point in the second grayscale image into a second coordinate system corresponding to the first grayscale image, so as to obtain third three-dimensional coordinates of the key point in the first grayscale image.

4. The three-dimensional key point acquisition system according to claim 1 or 3, wherein the number of the shooting devices is three, the three shooting devices are located on the same horizontal plane, and the three shooting devices shoot around the target object.

5. The three-dimensional keypoint acquisition system of claim 1 wherein the shape of the retroreflective material provided on the surface of each of said keypoints in said target object is different.

6. A three-dimensional key point acquisition method is characterized by comprising the following steps:

the method comprises the steps that a gray level image and a light spot image which are obtained after a target object is shot by a shooting device are obtained, the shooting device comprises at least one binocular camera and at least one gray level camera, the binocular camera consists of two infrared cameras, a light-reflecting material is arranged on the surface of a key point of the target object, the gray level image is shot by the gray level camera, the light spot image is shot by the binocular camera, each infrared camera in the binocular camera corresponds to one light spot image, the light spot image comprises a plurality of light spots, and each light spot corresponds to one key point;

determining a first three-dimensional coordinate of each key point according to the light spot image, wherein the first three-dimensional coordinate is located in a first coordinate system, and the first coordinate system is a three-dimensional coordinate system corresponding to a binocular camera in the shooting device;

7. The three-dimensional keypoint acquisition method of claim 6, wherein said determining, from said light spot image, a first three-dimensional coordinate of each of said keypoints comprises:

determining two-dimensional coordinates of each light spot in the light spot image;

taking the two-dimensional coordinates of the light spots as the two-dimensional coordinates of corresponding key points;

and determining a first three-dimensional coordinate of the key point according to the base line distance of the binocular camera, the focal length of the binocular camera and two-dimensional coordinates of the same key point in two light spot images, wherein the two light spot images are respectively obtained by shooting by two infrared cameras of the binocular camera.

8. The method according to claim 6, wherein the determining the second three-dimensional coordinates corresponding to each of the keypoints in the gray-scale image according to the first three-dimensional coordinates comprises:

acquiring a first coordinate transformation matrix between the first coordinate system and the second coordinate system;

and performing coordinate conversion on the first three-dimensional coordinate according to the first coordinate conversion matrix to obtain a second three-dimensional coordinate of the corresponding key point in the gray level image.

9. The three-dimensional key point acquisition method according to claim 6, wherein the number of the shooting devices is multiple, each shooting device corresponds to one gray-scale image, one gray-scale image in the multiple gray-scale images is a first gray-scale image, and the remaining gray-scale images are second gray-scale images;

after the second three-dimensional coordinates corresponding to the key points in the gray scale image are determined according to the first three-dimensional coordinates, the method further comprises the following steps:

and converting a second three-dimensional coordinate corresponding to each key point in the second gray scale image into a second coordinate system corresponding to the first gray scale image to obtain a third three-dimensional coordinate of the key point in the first gray scale image.

10. A three-dimensional keypoint acquisition device, comprising:

the device comprises an image acquisition module, a light spot acquisition module and a light source module, wherein the image acquisition module is used for acquiring a gray level image and a light spot image which are acquired after a target object is shot by a shooting device, the shooting device comprises a binocular camera and a gray level camera, the binocular camera consists of two infrared cameras, a reflective material is arranged on the surface of a key point of the target object, the gray level image is shot by the gray level camera, the light spot image is shot by the binocular camera, each infrared camera in the binocular camera corresponds to one light spot image, the light spot image comprises a plurality of light spots, and each light spot corresponds to one key point;

and the second coordinate determination module is used for determining a second three-dimensional coordinate corresponding to each key point in the gray level image according to the first three-dimensional coordinate, the second three-dimensional coordinate is located in a second coordinate system, and the second coordinate system is a three-dimensional coordinate system corresponding to a gray level camera in the shooting device.

11. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the three-dimensional keypoint acquisition method according to any one of claims 6 to 9.