WO2020057121A1 - 数据处理方法及装置、电子设备及存储介质 - Google Patents

数据处理方法及装置、电子设备及存储介质 Download PDF

Info

Publication number
WO2020057121A1
WO2020057121A1 PCT/CN2019/083959 CN2019083959W WO2020057121A1 WO 2020057121 A1 WO2020057121 A1 WO 2020057121A1 CN 2019083959 W CN2019083959 W CN 2019083959W WO 2020057121 A1 WO2020057121 A1 WO 2020057121A1
Authority
WO
WIPO (PCT)
Prior art keywords
coordinate
depth value
feature
target
image
Prior art date
Application number
PCT/CN2019/083959
Other languages
English (en)
French (fr)
Inventor
汪旻
邹壮
刘文韬
钱晨
马利庄
Original Assignee
北京市商汤科技开发有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京市商汤科技开发有限公司 filed Critical 北京市商汤科技开发有限公司
Priority to US17/049,687 priority Critical patent/US11238273B2/en
Priority to JP2020558429A priority patent/JP6985532B2/ja
Priority to SG11202010510XA priority patent/SG11202010510XA/en
Publication of WO2020057121A1 publication Critical patent/WO2020057121A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/23Recognition of whole body movements, e.g. for sport training
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person

Definitions

  • the present application relates to the field of information technology but is not limited to the field of information technology, and in particular, to a data processing method and device, an electronic device, and a storage medium.
  • somatosensory scenes such as somatosensory games
  • the somatosensory device collects the 3D posture of the human body and transmits it to the controlled device to control the controlled device.
  • the control of such controlled equipment usually requires the use of somatosensory equipment.
  • the embodiments of the present application are expected to provide a data processing method and device, an electronic device, and a storage medium.
  • a data processing method includes:
  • a 3D pose of the target is obtained.
  • a data processing device includes:
  • a first conversion module configured to convert a first 2D coordinate of the keypoint to a second 2D coordinate according to a reference depth value and an actual depth value of a keypoint of the target in the image, wherein the second 2D coordinate and The reference depth value constitutes a first 3D feature of the key point;
  • a first obtaining module is configured to obtain a 3D pose of the target based on the first 3D feature.
  • a computer storage medium stores computer-executable code. After the computer-executable code is executed, it can implement a data processing method provided by one or more technical solutions.
  • An electronic device including:
  • the processor is connected to the memory, and is configured to implement a data processing method provided by one or more technical solutions by executing computer-executable instructions stored on the memory.
  • the technical solution provided in the embodiment of the present application can acquire a 3D image of a target.
  • the 3D image includes a 2D image and a depth image.
  • the depth image provides a depth value that identifies the distance between the target and the camera; the 3D image provides the posture of the target such as RGB or YUV in the 2D imaging plane.
  • the 3D image can be used for 3D pose acquisition of the target.
  • a 3D pose of a target in a 3D space can be extracted.
  • the target moves back and forth relative to the camera; in this way, the depth value in the depth image collected by the camera changes.
  • the deep learning module can recognize the 3D poses of different near and far targets, on the one hand, it needs to be trained with specific near and far training samples, which is difficult to train and the training cycle is long; on the other hand, even after training with various distance and near training samples The training effect of the deep learning module may not be good enough, so the accuracy of 3D pose extraction for 3D images of some near and far with few samples will still be insufficient.
  • the target depth at the reference depth before the first 3D feature of the key point of the 3D image target is input to the depth model, the target depth at the reference depth can be obtained by converting the actual depth value to the reference depth value through the translation of the target in the 2D imaging plane.
  • the first 3D feature at the time of the value is input to the deep learning module for processing. Since the reference depth value used is the depth value used in the training of the deep learning module, the 3D of the target can be accurately extracted Posture; meanwhile, it reduces the samples and time required for deep learning module training.
  • FIG. 1A is a schematic flowchart of a data processing method according to an embodiment of the present application.
  • FIG. 1B is a schematic flowchart of a data processing method according to an embodiment of the present application.
  • FIG. 2 is a schematic diagram of converting a first 2D coordinate into a second 2D coordinate according to an embodiment of the present application
  • FIG. 3 is a schematic diagram of a key point provided by an embodiment of the present application.
  • FIG. 4 is a schematic diagram of an effect of object translation in a 2D image according to an embodiment of the present application.
  • FIG. 5 is a schematic diagram of obtaining a first 2D coordinate of a key point according to an embodiment of the present application
  • 6A is a schematic diagram of a key point and a reference point according to an embodiment of the present application.
  • 6B is a schematic diagram of another key point provided by an embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
  • FIG. 9 is a schematic network diagram of a neural network according to an embodiment of the present application.
  • this embodiment provides a data processing method, including:
  • Step S110 Convert the first 2D coordinate of the keypoint to the second 2D coordinate according to the reference depth value and the actual depth value of the keypoint of the target in the image, where the second 2D coordinate and the reference depth Value, which constitutes the first 3D feature of the key point;
  • Step S120 obtaining a 3D pose of the target based on the first 3D feature.
  • This embodiment provides a data-based processing method, which can be applied to one or more electronic devices.
  • the electronic device may include a processor, which can implement execution based on one or more steps in a data processing method through execution of executable instructions such as a computer program.
  • a single electronic device can be used for centralized data processing, and multiple electronic devices can also be used for distributed data processing.
  • the image may be a three-dimensional image; the three-dimensional image includes: a 2D image and a depth image.
  • the 2D image may be an RGB image or a YUV image.
  • the depth image may be depth information collected by a depth acquisition module.
  • the pixel value of the depth information is a depth value.
  • the depth value may be the distance between the image acquisition module and the target.
  • the actual depth value described in the embodiments of this application comes from the depth image.
  • the 2D coordinates of the key points of the target on the camera coordinate system are obtained after the actual depth value of the target is converted into the reference depth value.
  • step S110 it is equivalent to obtaining the first 3D feature for the deep learning module to accurately extract the 3D pose of the target.
  • the first 3D feature here may include: a coordinate in an image coordinate system and a reference depth value.
  • the 2D coordinates of the reference point of the target are obtained based on the key points; the 3D pose may be information representing the posture of the collected object in a three-dimensional space. Specifically, the 3D pose may be represented by a relative position between each key point and the reference point. Assume that, in 3D space, if the 2D coordinates of the reference point are: (0, 0, 0) and the target is a human body, multiple key points representing the human skeleton can be compared with (0, 0, 0). Relative position or coordinates.
  • the reference point may be a center point between two ends of a human hip.
  • the key points may be coordinate points representing head, neck, elbow, wrist, hip, knee, and ankle.
  • the current translation distances of the front, back, left, and right of the human body can be known, and the face orientation can also be known according to the relative positions of the keypoints of the human face and the reference point.
  • the key point of the human face may be a point located on the nose, for example, the coordinate point of the tip of the nose.
  • the key point of the torso can be the coordinates of the center point of the chest.
  • the above are only examples of key points, and the specific implementation is not limited to this.
  • N + M second 2D coordinates can be obtained according to the coordinates of the N key points.
  • the additional M second 2D coordinates can be generated based on the first 2D coordinates of the N key points.
  • the M may be 1, and the added second 2D coordinate may correspond to the 2D coordinate of the reference point of the human body.
  • the N may be 14.
  • step S120 the second 2D coordinates and reference depth values of N + 1 key points may be input into the deep learning module, and three-dimensional (3D) coordinates of N + S key points may be obtained as the 3D attitude output.
  • the N key points of the N + S key points correspond to the N key points of the first 2D coordinates one by one; the S key points are generated based on the N key points.
  • the N first 2D coordinates may be: 14 key points; S may be equal to 3; that is, the first 3D features of 17 key points may be obtained finally.
  • one of the 17 key points is a reference point.
  • the reference point may be the center points of two end points (corresponding to two key points) of the human hip.
  • the other two key points may be the coordinates of the nose of the human face and the coordinates of the center point of the chest; of course, this is only an example, and the specific implementation is not limited to this.
  • FIG. 6A may be a schematic diagram in which key point 0 is added to the 14 key points shown in FIG. 3;
  • FIG. 6B may be a schematic diagram of 17 key points generated based on the 14 key points shown in FIG. 3.
  • the 17 key points in FIG. 6B are equivalent to the key points shown in FIG. 3, and key point 0, key point 15 and key point 16 are added; among them, the 2D coordinates of key point 16 can be based on the key point 1 and the key point 2 Preliminary determination of 2D coordinates;
  • 2D coordinates of key point 15 may be determined according to 2D coordinates of key point 2 and 2D coordinates of key point 0.
  • the key point 0 may be a reference point provided in the embodiment of the present application.
  • a deep learning module such as a neural network
  • a deep learning module such as a neural network can only be trained by using training samples of the same depth value, so that the data amount of the training sample is small; the deep learning module such as a neural network has a fast convergence speed and a training cycle Short; in this way, deep learning modules such as neural networks can be simplified.
  • the deep learning modules such as neural networks will not sacrifice the accuracy of 3D pose extraction of 3D coordinates corresponding to a single depth value because of the need to take into account different depth values ; Therefore, it has the characteristics of high accuracy in 3D attitude extraction.
  • the step S110 may include: according to a ratio of the actual depth value to the reference depth value, and The first 2D coordinate is described, and the second 2D coordinate is obtained.
  • the step S110 may include: determining the second 2D coordinate by using the following functional relationship;
  • X2 is a coordinate value of the second 2D coordinate in the first direction
  • X1 is a coordinate value of the first 2D coordinate in the first direction
  • Y2 is the coordinate value of the second 2D coordinate in the second direction
  • Y1 is the coordinate value of the first 2D coordinate in the second direction
  • the second direction is perpendicular to the first direction
  • the D may be a spatial distance, and the unit may be: millimeter, centimeter, or decimeter.
  • the focal length of the image acquisition (can be abbreviated as f), which can be known by querying the camera parameters, and the second 2D coordinate and reference depth value can be obtained by trigonometric function transformation.
  • the second 2D coordinate and the reference depth value constitute the first 3D feature.
  • the first 3D feature of the standard depth value is input to the deep learning module, which can accurately extract the 3D pose of the target. Therefore, in some embodiments, the distance represented by od is the actual depth value, which can be abbreviated as d; the distance represented by oD is the reference depth value.
  • the method further includes:
  • Step S100 The optical center position obtains the first 2D coordinate according to the second 3D feature of the key point and the optical center position corresponding to the image.
  • Key points of the target human body on the actually acquired 2D image For example, key points on human bones.
  • the number of the key points may be multiple.
  • the key points may include: 14 2D key points.
  • a deep learning module of a 2D image may be used to obtain the third 2D coordinate by processing the 2D image.
  • Figure 2 shows 2D coordinates of key points of a human skeleton. 14 key points are represented by 14 black dots in FIG. 3.
  • a deep learning module may be used to process the 2D image to obtain the third 2D coordinate.
  • the third 3D coordinate and the actual depth value extracted from the depth image may constitute a second 3D feature.
  • the current target can be close to the distance between the image acquisition module. If the corresponding deep training module lacks the corresponding training samples before, the Accurately estimate the 3D attitude of the target.
  • a deep learning module needs to extract the 3D pose of the target in 3D images at different distances as accurately as possible. It needs to introduce more training samples to process the deep learning module. In this way, the training of the deep learning module is difficult and the training cycle is long.
  • the deep learning module may be various neural networks.
  • the deep learning module may include a network with 3D pose recognition, such as a fully connected network and a residual module of a residual network. Therefore, in this embodiment, in order to improve the accuracy of the 3D pose of the target, the depth value in the first 3D feature of the key point of the target is converted into the reference depth value.
  • first the third 2D coordinate needs to be converted into the first 2D coordinate so that the converted 2D coordinate is located on the optical axis of the image on.
  • Figure 4 is a 2D image.
  • the person was originally located in a non-middle position of the photo.
  • the person represented by the solid line in Figure 4 can be moved from the position of the third 2D coordinate to The position of the first 2D coordinate indicated by the dotted line.
  • the reference point is moved to the optical axis of the camera plane; compared to directly inputting the third 2D coordinate to the deep learning module, interference can be reduced, thereby improving the accuracy of the 3D attitude.
  • the data and / or duration required for the training of the deep learning module for 3D pose extraction is reduced, and the training of the deep learning module is simplified again and the training rate is increased.
  • the step S100 may include:
  • Step S101 moving the second 3D feature of the key point, so that the 3D feature of the reference point in the key point is translated to the position of the optical center, and the third 3D feature of each of the key points is obtained;
  • Step S102 Project the third 3D feature onto a 2D imaging plane to obtain the first 2D coordinate.
  • the reference point may be obtained according to the third 2D coordinates of the other key points. 2D coordinates of the reference point, and then search the depth image based on the 2D coordinates of the reference point to obtain the actual depth value of the corresponding position of the reference point; thereby obtaining the second 3D feature of the reference point. Then in step 100, all key points are moved as a whole. During the movement, the first 3D feature of the reference point is moved to the position of the optical center.
  • the optical center position (0, 0, 0); the second 3D feature of the reference point can be moved to the optical vector position of the optical center position, and the second 3D features of other key points can be solved using the same motion vector as the reference point.
  • Third 3D feature For example, the optical center position (0, 0, 0); the second 3D feature of the reference point can be moved to the optical vector position of the optical center position, and the second 3D features of other key points can be solved using the same motion vector as the reference point.
  • Third 3D feature Third 3D feature.
  • the third 3D features can be projected onto a 2D imaging plane to obtain the aforementioned first 2D coordinates.
  • the reference point of the target is moved to the optical axis of the camera coordinate system of the image.
  • Deep learning modules such as neural networks have higher accuracy in extracting 3D poses of targets located on the optical axis; reducing the introduction of errors because the target's reference point is located at a position other than the optical axis improves the accuracy of the 3D pose.
  • the first 3D feature of the reference point is determined based on the second 3D feature of two crotch key points among the key points.
  • the 2D coordinates of the reference points of the two key points can be calculated.
  • the coordinates of this point are the 2D coordinates of the reference point.
  • the 2D coordinates of the reference point may be referred to as the 2D coordinates of the root node.
  • the reference point may be a reference point of the target or a point near a center.
  • the reference points of the two cross-section key points as the reference point 2D coordinates are suitable for the specific structure of the human body.
  • the method further comprises: obtaining the 3D pose of the target based on the first 3D feature, comprising: subtracting the depth value corresponding to the second 2D coordinate of the key point from the depth The depth value of the reference point to obtain a fourth 2D coordinate and a depth value corresponding to the fourth 2D coordinate;
  • a deep learning model is used to process the normalized fourth 2D coordinates and the normalized depth value corresponding to the fourth 2D coordinates to obtain a 3D pose of the target.
  • the normalized fourth 2D coordinates and their corresponding depth values are respectively input to a neural network; the neural network can directly output the 3D pose; or, the neural network can output a first segment that can solve the 3D pose.
  • Four 3D features; the transformation based on the fourth 3D feature can obtain the 3D pose.
  • the difference caused by camera acquisition of different camera parameters can be eliminated, thereby eliminating the problem of low accuracy of 3D pose extraction caused by different camera parameters such as neural networks, which can further Improve the accuracy of 3D pose extraction of the target.
  • the fourth 2D coordinate and the depth value corresponding to the fourth 2D coordinate are normalized to obtain the normalized fourth 2D coordinate and the normalized
  • the depth value corresponding to the fourth 2D coordinate includes:
  • a normalized fourth 2D coordinate is obtained according to the coordinate mean and variance and a depth value corresponding to the fourth 2D coordinate and the fourth 2D coordinate.
  • the mean is represented by Mean
  • the variance is represented by Std
  • the fourth 2D coordinate may be used for the following functional relationship calculation:
  • X4 is the coordinate value of the fourth 2D coordinate in the first direction
  • Y4 is the coordinate value of the fourth 2D coordinate in the second direction
  • X4 ′ is the coordinate value of the normalized fourth 2D coordinate in the first direction
  • Y4 ' is the coordinate value of the normalized fourth 2D coordinate in the second direction
  • Stdx is the variance of the coordinate value in the first direction.
  • Stdy is the variance of the coordinate values in the first direction.
  • the method further includes:
  • a rotation parameter and a translation parameter of the target are obtained.
  • projecting a 3D attitude into a two-dimensional plane may include: projecting a first 3D feature representing the 3D attitude into a 2D imaging plane, thereby obtaining a 2D projection image in the 2D imaging plane.
  • the projection matrix here may be determined according to camera parameters and / or empirical values of projection.
  • a projection model that can project a 3D pose into a 2D imaging plane, for example, a projection neural network, takes the 3D pose as an input, and uses 2D coordinates projected into the 2D imaging plane as an output.
  • the distance from the third 2D coordinates can be calculated, a group with the smallest distance is selected, and the rotation parameter and the translation parameter are calculated.
  • the 3D pose is calculated based on the reference depth value. In this way, it is possible to use the trigonometric function relationship shown in FIG. 2 to translate the 3D pose back to the position of the actual depth value.
  • the projection of the 3D pose into the 2D imaging plane can be performed based on the actual depth value and its approximate value.
  • the distance between the 2D coordinates projected into the two-dimensional plane and the actual third 2D coordinates must be minimized.
  • the following function can be used to represent the minimum value of the fifth 2D coordinate and the third 2D coordinate: min ⁇ (X5-X3) 2 + (Y5-Y3) 2 ⁇ ;
  • (X5, Y5) are the fifth 2D coordinates; (X3, Y3) are the third 2D coordinates.
  • S 3 represents the first 3D feature of the key point
  • S 2 represents the 2D coordinates of the key point.
  • the depth range of the iterative calculation is given. For example, the maximum value of the depth range is obtained according to the actual depth value plus an offset; the minimum value of the depth range is obtained by subtracting an offset from the actual depth value.
  • an actual depth value may be selected within the depth range. The reason why the depth range is selected based on the actual depth value is that on the one hand, the deviation of the depth image collected by the depth camera is considered, and on the other hand, the network error is considered.
  • a 3D attitude is projected into a 2D imaging plane to obtain an optimal fifth 2D coordinate, thereby estimating a rotation parameter and / or a translation parameter.
  • the translation parameter may characterize a target translation condition, and the rotation parameter may characterize a target rotation condition.
  • the translation parameters may include: translation displacements in various directions; and the rotation parameters may include: rotational displacements in various directions.
  • the actual depth value since the actual depth value is known in advance during the iteration process, the actual depth value may be used as a reference depth value, and the 3D may be performed within a depth range including the actual depth value.
  • this embodiment provides a data processing apparatus, including:
  • the first conversion module 110 is configured to convert a first 2D coordinate of the keypoint into a second 2D coordinate according to a reference depth value and an actual depth value of a keypoint of the target in the image, where the second 2D coordinate And the reference depth value constitute a first 3D feature of the key point;
  • the first obtaining module 120 is configured to obtain a 3D pose of the target based on the first 3D feature.
  • the first conversion module 110 and the first obtaining module 120 may be program modules. After the program modules are executed by the processor, the first 2D coordinates can be converted to the second 2D coordinates, and the 3D attitude can be realized. Gain.
  • the first conversion module 110 and the first obtaining module 120 may also be a combination of hardware modules and program modules, for example, a complex programmable array or a field programmable array.
  • the first conversion module 110 and the first obtaining module 120 may correspond to hardware modules.
  • the first conversion module 110 and the first obtaining module 120 may be application specific integrated circuits.
  • the first conversion module 110 is configured to obtain the second 2D coordinate according to a ratio of the actual depth value to the reference depth value and the first 2D coordinate.
  • the first conversion module 110 is configured to determine the second 2D coordinate by using the following functional relationship;
  • X2 is a coordinate value of the second 2D coordinate in the first direction
  • X1 is a coordinate value of the first 2D coordinate in the first direction
  • Y2 is the coordinate value of the second 2D coordinate in the second direction
  • Y1 is the coordinate value of the first 2D coordinate in the second direction
  • the second direction is perpendicular to the first direction
  • the apparatus further includes:
  • the second conversion module is configured to obtain the first 2D coordinate according to the second 3D feature of the key point and the optical center position corresponding to the image; wherein the second 3D feature includes: based on 2D The third 2D coordinate obtained from the image and the actual depth value obtained based on the depth image.
  • the second conversion module is configured to move the second 3D feature of the key point, so that the 3D feature of the reference point in the key point is translated to the optical center position, and each of the The third 3D feature of the key point; the third 3D feature is projected onto a 2D imaging plane to obtain the first 2D coordinate optical center position.
  • the first 3D feature of the reference point is determined based on the second 3D feature of two crotch key points among the key points.
  • the first obtaining module is configured to subtract a depth value of the reference point from a depth value corresponding to a second 2D coordinate of the key point to obtain a fourth 2D coordinate and the fourth 2D A depth value corresponding to the coordinates; performing normalization processing on the fourth 2D coordinate and the depth value corresponding to the fourth 2D coordinate to obtain the normalized fourth 2D coordinate and the normalized value The depth value corresponding to the fourth 2D coordinate; using a deep learning model to process the normalized fourth 2D coordinate and the normalized depth value corresponding to the fourth 2D coordinate to obtain a 3D of the target attitude.
  • the first obtaining module 120 is configured to obtain the coordinate mean and variance of the key point based on the fourth 2D coordinate and the depth value corresponding to the fourth 2D coordinate; according to the coordinate mean and The variance and the depth value corresponding to the fourth 2D coordinate and the fourth 2D coordinate are used to obtain a normalized fourth 2D coordinate.
  • the apparatus further includes:
  • An iteration module configured to perform an iterative operation of projecting the 3D attitude into a two-dimensional plane based on the actual depth value to obtain a fifth 2D coordinate with a minimum distance from the third 2D coordinate;
  • a second obtaining module is configured to obtain a rotation parameter and a translation parameter of the target according to the fifth 2D coordinate and the first 3D feature.
  • an electronic device including:
  • Memory configured to store information
  • a processor connected to the memory and configured to implement a data processing method provided by one or more of the foregoing technical solutions by executing computer-executable instructions stored on the memory, for example, as shown in FIG. 1A, FIG. 1B, and FIG. One or more of the methods shown in 5.
  • the memory can be various types of memory, such as random access memory, read-only memory, flash memory, and the like.
  • the memory may be used for information storage, for example, storing computer-executable instructions and the like.
  • the computer-executable instructions may be various program instructions, for example, target program instructions and / or source program instructions.
  • the processor may be various types of processors, for example, a central processing unit, a microprocessor, a digital signal processor, a programmable array, a digital signal processor, an application specific integrated circuit, or an image processor.
  • the processor may be connected to the memory through a bus.
  • the bus may be an integrated circuit bus or the like.
  • the terminal device may further include a communication interface
  • the communication interface may include a network interface, for example, a local area network interface, a transceiver antenna, and the like.
  • the communication interface is also connected to the processor and can be used for information transmission and reception.
  • the terminal device further includes a human-machine interaction interface.
  • the human-machine interaction interface may include various input and output devices, for example, a keyboard, a touch screen, and the like.
  • An embodiment of the present application provides a computer storage medium, where the computer storage medium stores computer-executable code; after the computer-executable code is executed, the data processing method provided by the foregoing one or more technical solutions can be implemented, for example, For example, one or more of the methods shown in FIG. 1A, FIG. 1B and FIG. 5.
  • the storage medium includes various media that can store program codes, such as a mobile storage device, a read-only memory (ROM, Read-Only Memory), a random access memory (RAM, Random Access Memory), a magnetic disk, or an optical disc.
  • the storage medium may be a non-transitory storage medium.
  • An embodiment of the present application provides a computer program product, where the program product includes computer-executable instructions; after the computer-executable instructions are executed, the data processing method provided by any of the foregoing implementations can be implemented, for example, FIGS. 1A, 1B, and One or more of the methods shown in FIG. 5.
  • This example uses a deep neural network to predict the two-dimensional and three-dimensional key points of the human body, and then uses the three-dimensional vision algorithm to calculate the three-dimensional pose of the human body; it may include:
  • the normalized 2D coordinates and the reference depth value are input to a deep neural network, and the neural network performs the first 3D feature from the 2D key point to the 3D key point.
  • a three-dimensional vision algorithm and the like can be obtained to obtain a 3D pose. For example, optimization based on perspective n-point positioning (PnP) to obtain a 3D pose based on the first 3D feature.
  • PnP perspective n-point positioning
  • Figure 9 shows a neural network that can obtain the 3D pose provided by this example, including:
  • the fully connected layer obtains the first 3D features of 14 key points; the output is a 3D pose.
  • the neural network can be used to extract the 3D pose.
  • This example provides a data processing method, including:
  • the obtained two-dimensional key points are aligned with the three-dimensional key points, and the PnP algorithm is used to obtain the three-dimensional human space pose.
  • Example 3 For each frame of a 3D image, the two-dimensional key point detection tool of the human body is used to obtain the coordinates of 14 key points on the image;
  • the corresponding three-dimensional human skeleton is obtained (17 keypoints, of which the position of the keypoints at the pelvis is fixed at 0).
  • f x , f y , c x , c y can be calibrated from the current equipment by Zhang Zhengyou's calibration method. May wish to set the two-dimensional human skeleton S 2 and the three-dimensional human skeleton S 3 after alignment, then optimize the following formula Since a continuous video is used as input, the R and T of the previous frame can be used as the initial values of the next frame.
  • the disclosed device and method may be implemented in other ways.
  • the device embodiments described above are only schematic. For example, the division of the unit is only a logical function division.
  • the units described above as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, which may be located in one place or distributed to multiple network units; Some or all of the units may be selected according to actual needs to achieve the objective of the solution of this embodiment.
  • the functional units in the embodiments of the present application may be all integrated into one processing module, or each unit may be separately used as a unit, or two or more units may be integrated into one unit; the above integration
  • the unit can be implemented in the form of hardware, or in the form of hardware plus software functional units.
  • the foregoing program may be stored in a computer-readable storage medium.
  • the program is executed, the program is executed.
  • the foregoing storage medium includes: a mobile storage device, a read-only memory (ROM, Read-Only Memory), a random access memory (RAM, Random Access Memory), a magnetic disk or an optical disk, etc.
  • ROM read-only memory
  • RAM random access memory
  • magnetic disk or an optical disk etc.

Abstract

本申请实施例提供了一种数据处理方法及装置、电子设备及存储介质。所述数据处理方法包括:根据参考深度值及图像中目标的关键点的实际深度值,将所述关键点的第一2D坐标的转换为第二2D坐标,其中,所述第二2D坐标和所述参考深度值,构成了所述关键点的第一3D特征;基于所述第一3D特征,获得所述目标的3D姿态。

Description

数据处理方法及装置、电子设备及存储介质
相关申请的交叉引用
本申请基于申请号为201811089872.4、申请日为2018年09月18日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。
技术领域
本申请涉及信息技术领域但不限于信息技术领域,尤其涉及一种数据处理方法及装置、电子设备及存储介质。
背景技术
在体感游戏等体感场景中,一般都需由人体佩戴体感设备,体感设备采集人体的3D姿态传输给受控设备,进行受控设备的控制。这种受控设备的控制,通常需要使用到体感设备。
发明内容
有鉴于此,本申请实施例期望提供一种数据处理方法及装置、电子设备及存储介质。
本申请的技术方案是这样实现的:
一种数据处理方法,包括:
根据参考深度值及图像中目标的关键点的实际深度值,将所述关键点的第一2D坐标的转换为第二2D坐标,其中,所述第二2D坐标和所述参考深度值,构成了所述关键点的第一3D特征;
基于所述第一3D特征,获得所述目标的3D姿态。
一种数据处理装置,包括:
第一转换模块,配置为根据参考深度值及图像中目标的关键点的实际深度值,将所述关键点的第一2D坐标的转换为第二2D坐标,其中,所述第二2D坐标和所述参考深度值,构成了所述关键点的第一3D特征;
第一获得模块,配置为基于所述第一3D特征,获得所述目标的3D姿态。
一种计算机存储介质,所述计算机存储介质存储有计算机可执行代码;所述计算机可执行代码被执行后,能够实现一个或多个技术方案提供的数 据处理方法。
一种电子设备,其中,包括:
存储器,用于存储信息;
处理器,与所述存储器连接,用于通过执行存储在所述存储器上的计算机可执行指令,能够实现一个或多个技术方案提供的数据处理方法。
本申请实施例提供的技术方案,可以对目标采集3D图像。该3D图像包括2D图像和深度图像组成,深度图像提供标识目标与相机之间距离的深度值;3D图像提供RGB或YUV等目标在2D成像平面内的姿势。如此,3D图像可以用于目标的三维姿态获取。例如,通过神经网络等深度学习模块对3D图像的处理,可以提取出目标在三维空间内的三维姿态。但是在图像采集的过程中,目标相对于相机是前后移动的;如此,使得相机采集的深度图像中的深度值发生变化。若深度学习模块能够识别不同远近目标的三维姿态,一方面需要利用特定远近的训练样本进行训练,训练难度大,且训练周期长;另一方面,即便利用各种远近的训练样本进行了训练之后,深度学习模块的训练效果未必足够好,故对于某些样本比较少的远近的3D图像的三维姿态提取精度依然会不够。在本申请实施例中,在将3D图像目标关键点的第一3D特征输入到深度模型之前,通过2D成像平面内目标的平移,实际深度值向参考深度值的转换,可以获得目标在参考深度值时的第一3D特征,将该第一3D特征输入到深度学习模块中进行处理,由于采用的参考深度值是深度学习模块的训练时使用的深度值,从而可以精准的提取出目标的3D姿态;同时减少了深度学习模块训练所需的样本和时间。
附图说明
图1A为本申请实施例提供的一种数据处理方法的流程示意图;
图1B为本申请实施例提供的一种数据处理方法的流程示意图;
图2为本申请实施例提供的一种第一2D坐标转换为第二2D坐标的转换示意图;
图3为本申请实施例提供的一种关键点的示意图;
图4为本申请实施例提供的一种2D图像中目标平移的效果示意图;
图5为本申请实施例提供的得到关键点的第一2D坐标的示意图;
图6A为本申请实施例提供的一种关键点和基准点的示意图;
图6B为本申请实施例提供的另一种关键点的示意图;
图7为本申请实施例提供的一种数据处理装置的结构示意图;
图8为本申请实施例提供的一种电子设备的结构示意图;
图9为本申请实施例提供的一种神经网络的网络示意图。
具体实施方式
以下结合说明书附图及具体实施例对本申请的技术方案做进一步的详细阐述。
如图1A所示,本实施例提供一种数据处理方法,包括:
步骤S110:根据参考深度值及图像中目标的关键点的实际深度值,将所述关键点的第一2D坐标的转换为第二2D坐标,其中,所述第二2D坐标和所述参考深度值,构成了所述关键点的第一3D特征;
步骤S120:基于所述第一3D特征,获得所述目标的3D姿态。
本实施例中提供基于数据处理方法,该基于数据处理方法可以应用于一台或多台电子设备中。所述电子设备可包括:处理器,该处理器通过计算机程序等可执行指令的执行,可以实现基于数据处理方法中一个或多个步骤的执行。在一些实施例中,可以单一电子设备进行集中式的数据处理,也可以运用多台电子设备进行分布式的数据处理。
所述图像可为三维图像;所述三维图像包括:2D图像和深度图像。所述2D图像可为RGB图像或YUV图像等。所述深度图像可为利用深度采集模组采集的深度信息。所述深度信息的像素值为深度值。所述深度值可为:图像采集模组距离目标之间的距离。此处,本申请实施例中所述实际深度值就来自深度图像
通过第一2D坐标到第二2D坐标的转换,就得到了将目标的实际深度值转换为参考深度值之后,在相机坐标系上目标的关键点的2D坐标。
在步骤S110中,就相当于得到了可供深度学习模块能够精准提取出目标的3D姿态的第一3D特征。向深度学习模块输入所述第一3D特征,深度模型自动会输出目标的3D姿态;所述3D姿态可用于位于三维空间坐标系内的第一3D特征之间的相对位置表示。
此处的所述第一3D特征可包括:图像坐标系内的坐标和参考深度值构成。
例如,基于所述关键点得到了目标的基准点的2D坐标;所述3D姿态可为表征采集对象在三维空间内的姿势体态的信息。具体而言,所述3D姿态可由:各个关键点与所述基准点之间的相对位置表示。假设,在3D空间内,若所述基准点的2D坐标为:(0,0,0),且目标为人体,则可以通过表征人体骨骼的多个关键点相对于(0,0,0)的相对位置或相对坐标。
所述基准点可为:人体髋部两端之间的中心点。例如,所述关键点可为:表示头、颈、肘、腕、髋、膝及踝的坐标点。如此,根据这些关键点相对于基准点的相对位置,就知道当前人体前、后、左、右的平移距离,还可以根据人脸的关键点与基准点之间的相对位置,知道人脸朝向,从而知道人体头部的转动量和/或转动方向等转动参数;根据躯干的关键点与基准点之间的相对位置,可以知道躯干的转动量和/或转动方向等转动参数。 所述人脸的关键点可取位于鼻子上的一个点,例如,鼻尖的坐标点。所述躯干的关键点可取胸部中心点坐标。当然以上仅是关键点举例,具体实现不局限于此。
进一步地,若所述目标为人体,则在步骤S110中可以根据N个关键点的坐标,得到N+M个第二2D坐标。其中增加的M个第二2D坐标,可为根据N个关键点的第一2D坐标生成的。例如,所述M可为1,增加的1个第二2D坐标可对应于人体的基准点的2D坐标。所述N可为14。
在步骤S120中可以根据N+1个关键点的第二2D坐标和参考深度值输入到深度学习模块中,可能会得到N+S个关键点的三维(3D)坐标,作为所述3D姿态输出。其中,N+S个关键点中的N个关键点与第一2D坐标的N个关键点一一对应;S个关键点是基于N个关键点生成的。
例如,以人体为例,N个第一2D坐标可为:14个关键点;S可等于3;即最终会得到17个关键点的第一3D特征。在一些实施例中,17个关键点中有一个为基准点。该基准点可为人体髋部两个端点(对应于两个关键点)的中心点。另外两个关键点可为人脸的鼻头坐标及胸部的中心点坐标;当然此处仅是举例,具体实现不局限于此。
图6A可为相对于图3所示的14个关键点增加了关键点0的示意图;图6B可为一种基于图3所示的14个关键点生成的17个关键点的示意图。图6B中17个关键点,相当于图3所示的关键点,增加了关键点0、关键点15及关键点16;其中,关键点16的2D坐标可基于关键点1及关键点2的2D坐标的初步确定;关键点15的2D坐标可根据关键点2的2D坐标及关键点0的2D坐标的确定的。关键点0可为本申请实施例提供的基准点。
在本申请实施例中,一方面,神经网络等深度学习模块在训练过程中,若为了能够直接检测不同实际深度的目标的3D姿态,则需要将利用不同实际深度值的训练样本训练神经网络;若这样,训练所需训练样本多,训练样本多则神经网络等深度学习模块的收敛速度慢,导致训练周期长。若采用本实施例中的方法,神经网络等深度学习模块可以仅采用为同一个深度值的训练样本进行训练,从而训练样本的数据量小;神经网络等深度学习模块的收敛速度快,训练周期短等;如此,可以简化神经网络等深度学习模块。
另一方面,若采用单一深度值(即参考深度值),可以使得神经网络等深度学习模块不会因为需要顾及到不同深度值牺牲掉单一深度值所对应的3D坐标的3D姿态提取的精准度;从而具有3D姿态提取的精准度高的特点。
第一3D特征深度学习模块第一3D特征深度学习模块深度学习模块深度学习模块在一些实施例中,所述步骤S110可包括:根据所述实际深度值与所述参考深度值的比值,及所述第一2D坐标,得到所述第二2D坐标。
进一步地,例如,所述步骤S110可包括:利用如下函数关系确定所述 第二2D坐标;
X2=(X1*d)/D,
Y2=(Y1*d)/D,
其中,X2为所述第二2D坐标在第一方向上的坐标值;X1为所述第一2D坐标在所述第一方向上的坐标值;
Y2为第二2D坐标在第二方向上的坐标值,Y1为所述第一2D坐标在所述第二方向上的坐标值;其中,所述第二方向垂直于所述第一方向;
d为所述实际深度值;D为所述参考深度值。
所述D可为空间距离,单位可为:毫米、厘米或分米等。
参考如图2所示,of为图像采集的焦距(可简写成f),是可以通过查询相机参数知道的,通过三角函数变换可以得到第二2D坐标和参考深度值。第二2D坐标和参考深度值,构成了所述第一3D特征,将标准的深度值的第一3D特征输入到深度学习模块,能够实现目标3D姿态的精准提取。故在一些实施例中od表示的距离为所述实际深度值,可简写成d;oD表示的距离为参考深度值。根据三角函数关系可以指y0/y1=f/d;y2/y1=f/D;y0表示的第一2D坐标,y2表示的第二2D坐标。如此,y2=(d*y0)/D。
在本实施例中,根据图像关键点在采集的图像中的2D坐标,在本实施例中实际采集得到的图像的2D坐标称之为第三2D坐标。如图1B所示,所述方法还包括:
步骤S100:光心位置根据所述关键点的第二3D特征及所述图像对应的光心位置,得到所述第一2D坐标。
实际采集的2D图像上目标人体的关键点。例如,人体骨骼上的关键点。
所述关键点的个数可为多个。例如,若目标为人体,则所述关键点可包括:14个2D关键点。此处,可以利用2D图像的深度学习模块,通过对2D图像的处理,获得所述第三2D坐标。
如图3所示为一种人体骨骼的关键点的2D坐标。在图3中用14个黑色圆点表示14个关键点。
在一些实施例中,可以利用深度学习模块对所述2D图像进行处理,从而获得所述第三2D坐标。该第三3D坐标和从深度图像中提取的实际深度值,可以组成第二3D特征。
利用深度学习模块基于目标的第一3D特征进行目标的3D姿态估计时,当前目标距离图像采集模组之间的距离可近可远,若深度学习模块之前缺少对应的训练样本,则会导致无法精确的预估目标3D姿态。而一个深度学习模块需要尽可能对不同远近的3D图像中目标3D姿态的精准提取,则需要引入更多的训练样本对深度学习模块进行处理;如此,深度学习模块的训练难度大且训练周期长。所述深度学习模块可为:各种神经网络,例如,可包括:全连接网络及残差网络的残差模块等具有3D姿态识别的网络。故在本实施例中,为了提升目标3D姿态的精准度,会将目标的关键点 的第一3D特征中深度值转换为参考深度值。
为了实现成功将所述关键点的第一3D特征中的深度值转换为参考深度值,首先需要将第三2D坐标转换为第一2D坐标,使得转换后的2D坐标位于所述图像的光轴上。
图4为一个2D图像,拍摄的人物原本位于照片的非中间位置,通过基于光心位置之间的坐标平移,可以将图4中实线表示的人物从第三2D坐标所在的位置,移动到虚线表示的第一2D坐标所在的位置上。通过将关键点中的基准点在相机平面上的平移,使得基准点移动到相机平面的光轴上;相对于直接向深度学习模块输入第三2D坐标可以减少干扰,从而提升3D姿态的精准度,同时减少3D姿态提取的深度学习模块训练所需的数据和/或时长,再次简化深度学习模块的训练及提升训练的速率。
第三2D坐标到第一2D坐标的转换方式有多种,以下提供一种可选方式:
如图5所示,所述步骤S100可包括:
步骤S101:移动所述关键点的第二3D特征,使得所述关键点中基准点的3D特征平移到所述光心位置,并得到各所述关键点的第三3D特征;
步骤S102:将所述第三3D特征投影到2D成像平面,得到所述第一2D坐标。
在本实施例中,若神经网络等深度学习模型从3D图像中的关键点的第二3D特征不包括基准点的第二3D特征时,可以根据其他关键点的第三2D坐标,得到基准点的2D坐标,然后基于基准点的2D坐标查找深度图像得到基准点对应位置的实际深度值;从而获得基准点的第二3D特征。然后在步骤100中整体移动所有关键点,在移动的过程中,使得基准点的第一3D特征移动到光心位置上。例如,光心位置(0,0,0);可以根据基准点的第二3D特征移动到光心位置的移动向量,求解出其他关键点的第二3D特征与基准点采用同样移动向量移动之后的第三3D特征。
在得到所有关键点的第三3D特征之后,可以将第三3D特征投影到2D成像平面就得到了前述的第一2D坐标。
通过第二3D坐标的移动,至少使得目标的基准点移动到了图像的相机坐标系的光轴上。神经网络等深度学习模块对位于光轴上的目标的3D姿态的提取精准度是更高的;减少因为目标的基准点位于非光轴位置上引入了误差的现象,提升3D姿态的精准度。在还有一些实施例中,若所述目标为人体骨骼,所述基准点的第一3D特征是基于所述关键点中的两个胯部关键点的第二3D特征确定的。
如图6B所示的关键点9及关键点10的第三2D坐标,可以计算这两个关键点的基准点的2D坐标。而该点的坐标即为所述基准点的2D坐标。
在一些实施例中,所述基准点的2D坐标可以称之为根节点的2D坐标。
在一些实施例中,所述基准点可为目标的基准点或者靠近中心的位置的一个点。在本实施例中针对人体而言,采用的两个跨部关键点的基准点作为基准点的2D坐标,是与人体的具体结构相适合的。
在一些实施例中,所述方法还包括:所述基于所述第一3D特征,获得所述目标的3D姿态,包括:对所述关键点的第二2D坐标对应的深度值减去所述基准点的深度值,得到第四2D坐标及所述第四2D坐标对应的深度值;
对所述第四2D坐标及所述第四2D坐标对应的深度值进行归一化处理,得到归一化后的所述第四2D坐标及归一化后的所述第四2D坐标对应的深度值;
利用深度学习模型对归一化后的所述第四2D坐标及归一化后的所述第四2D坐标对应的深度值进行处理,得到所述目标的3D姿态。
例如,将归一化后的第四2D坐标及其对应的深度值,分别输入神经网络;神经网络可直接输出所述3D姿态;或者,神经网络可以输出一个能够求解出所述3D姿态的第四3D特征;基于第四3D特征的转换可以得到所述3D姿态。
在本实施例中,通过归一化处理,可以消除不同相机参数的相机采集导致的差异,从而消除神经网络等深度学习模型因为不同相机参数导致的3D姿态提取的精度低的问题,从而可以进一步提升目标的3D姿态提取精确度。
在一些实施例中,所述对所述第四2D坐标及所述第四2D坐标对应的深度值进行归一化处理,得到归一化后的所述第四2D坐标及归一化后的所述第四2D坐标对应的深度值,包括:
基于第四2D坐标及所述第四2D坐标对应的深度值,得到所述关键点的坐标均值及方差;
根据所述坐标均值及方差及所述第四2D坐标及所述第四2D坐标对应的深度值,得到归一化后的第四2D坐标。
具体地,所述均值利用Mean表示,方差利用Std表示;则所述第四2D坐标可可以用于如下函数关系计算:
X4’=(X4-Mean)/Stdx;
Y4’=(Y4-Mean)/Stdy。
X4为第四2D坐标在第一方向上的坐标值;Y4为第四2D坐标在第二方向上的坐标值;X4’为归一化后的第四2D坐标在第一方向上的坐标值;Y4’为归一化后的第四2D坐标在第二方向上的坐标值;Stdx为第一方向上的坐标值的方差。Stdy为第一方向上的坐标值的方差。
在一些实施例中,所述方法还包括:
基于所述实际深度值,进行将所述3D姿态投影到二维平面内的迭代运算,以得到与所述第三2D坐标之间距离最小的第五2D坐标;
根据所述第五2D坐标及所述第一3D特征,获得所述目标的转动参数和平移参数。
在本实施例中,将3D姿态投影到二维平面内,可包括:将表征3D姿态的第一3D特征投影到2D成像平面内,从而获得2D成像平面内的2D投影图像。
投影的方式有多种,以下提供两种可选方式:
可选方式一:根据所述3D姿势及投影矩阵,得到投影到2D成像平面内的2D坐标;例如,将3D姿势左乘投影矩阵,得到所述投影到2D成像平面内的坐标。此处的投影矩阵可以为根据相机参数和/或者投影的经验值确定的。
可选方式二:利用可以将3D姿势投影到2D成像平面内的投影模型,例如,投影神经网络,以所述3D姿势为输入,以投影到2D成像平面内的2D坐标为输出。
得到投影到2D成像平面内的输出2D坐标(即所述第五2D坐标)就可以计算与第三2D坐标之前的距离,选择距离最小的一组,计算所述转动参数及所述平移参数。总之在投影的过程中,相当于将深度值去除,仅保留2D成像平面内的2D坐标。但是在本实施例中,实质上3D姿势是基于参考深度值计算的,如此,可能利用图2所示的三角函数关系,将3D姿势平移回到实际深度值的位置上。但是考虑到深度学习模块的处理误差及相机的处理误差等,故可以基于实际深度值及其近似值,进行3D姿势向2D成像平面内的投影。投影的过程中,需要使得投影到二维平面内的2D坐标,与实际的第三2D坐标距离最小。例如,可用如下函数表示第五2D坐标与第三2D坐标的最小化值为:min{(X5-X3) 2+(Y5-Y3) 2};
(X5,Y5)为所述第五2D坐标;(X3,Y3)为所述第三2D坐标。
接着利用如下函数关系,可以求解出所述转动参数R和平移参数T,
Figure PCTCN2019083959-appb-000001
S 3表示关键点的第一3D特征;S 2表示关键点的2D坐标。
由于实际深度值给出了迭代计算的深度范围。例如,根据实际深度值加上一个偏移量得到所述深度范围的最大值;实际深度值减去一个偏移量得到了所述深度范围的最小值。在进行所述3D姿态向2D成像平面内的投影时,可以在该深度范围内选择实际深度值。之所以基于实际深度值选择深度范围,一方面是考虑了深度摄像头采集深度图像的有偏差,另一方面是考虑了网络的误差,基于上述两个方面的考虑通过深度范围来进行容错处理,从而实现3D姿态向2D成像平面内投影,以获得最优的第五2D坐标,从而估计出转动参数和/或平动参数。
所述平动参数可表征了目标平移状况,所述转动参数表征了目标的转动状况。所述平动参数可包括:在各个方向上平动位移量;所述转动参数可包括:在各个方向上的转动位移量。
在本申请实施例中,由于迭代的过程中,是预先知道了实际深度值,可以以所述实际深度值为参考深度值,在包含有所述实际深度值的深度范围内,进行所述3D姿态到二维平面内的投影;相对于没有实际深度值提供深度范围的情况下迭代计算,大大的减少了迭代次数,节省了计算量及提升了计算速率。
如图7所示,本实施例提供一种数据处理装置,包括:
第一转换模块110,配置为根据参考深度值及图像中目标的关键点的实际深度值,将所述关键点的第一2D坐标的转换为第二2D坐标,其中,所述第二2D坐标和所述参考深度值,构成了所述关键点的第一3D特征;
第一获得模块120,配置为基于所述第一3D特征,获得所述目标的3D姿态。
在一些实施例中,所述第一转换模块110及第一获得模块120可为程序模块,该程序模块被处理器执行之后,能够实现第一2D坐标向第二2D坐标的转换,及3D姿态的获得。
在另一些实施例中,所述第一转换模块110及第一获得模块120还可为硬件模块及程序模块的组合,例如,复杂可编程阵列或者现场可编程阵列。
在还有一些实施例中,所述第一转换模块110及第一获得模块120可对应于硬件模块,例如,所述第一转换模块110及第一获得模块120可为专用集成电路。
在一些实施例中,所述第一转换模块110,配置为根据所述实际深度值与所述参考深度值的比值,及所述第一2D坐标,得到所述第二2D坐标。
在一些实施例中,所述第一转换模块110,配置为利用如下函数关系确定所述第二2D坐标;
X2=(X1*d)/D,
Y2=(Y1*d)/D,
其中,X2为所述第二2D坐标在第一方向上的坐标值;X1为所述第一2D坐标在所述第一方向上的坐标值;
Y2为第二2D坐标在第二方向上的坐标值,Y1为所述第一2D坐标在所述第二方向上的坐标值;其中,所述第二方向垂直于所述第一方向;
d为所述实际深度值;D为所述参考深度值。
在一些实施例中,所述装置还包括:
第二转换模块,配置为光心位置根据所述关键点的第二3D特征及所述图像对应的光心位置,得到所述第一2D坐标;其中,所述第二3D特征包括:基于2D图像得到的第三2D坐标及基于深度图像得到的实际深度值。
在一些实施例中,所述第二转换模块,配置为移动所述关键点的第二3D特征,使得所述关键点中基准点的3D特征平移到所述光心位置,并得到各所述关键点的第三3D特征;将所述第三3D特征投影到2D成像平面, 得到所述第一2D坐标光心位置。
在一些实施例中,若所述目标为人体骨骼,所述基准点的第一3D特征是基于所述关键点中的两个胯部关键点的第二3D特征确定的。
在一些实施例中,所述第一获得模块,配置为对所述关键点的第二2D坐标对应的深度值减去所述基准点的深度值,得到第四2D坐标及所述第四2D坐标对应的深度值;对所述第四2D坐标及所述第四2D坐标对应的深度值进行归一化处理,得到归一化后的所述第四2D坐标及归一化后的所述第四2D坐标对应的深度值;利用深度学习模型对归一化后的所述第四2D坐标及归一化后的所述第四2D坐标对应的深度值进行处理,得到所述目标的3D姿态。
在一些实施例中,所述第一获得模块120,配置为基于第四2D坐标及所述第四2D坐标对应的深度值,得到所述关键点的坐标均值及方差;根据所述坐标均值及方差及所述第四2D坐标及所述第四2D坐标对应的深度值,得到归一化后的第四2D坐标。
在一些实施例中,所述装置还包括:
迭代模块,配置为基于所述实际深度值,进行将所述3D姿态投影到二维平面内的迭代运算,以得到与所述第三2D坐标之间距离最小的第五2D坐标;
第二获得模块,配置为根据所述第五2D坐标及所述第一3D特征,获得所述目标的转动参数和平移参数。
如图8所示,本申请实施例提供了一种电子设备,包括:
存储器,配置为存储信息;
处理器,与所述存储器连接,配置为通过执行存储在所述存储器上的计算机可执行指令,能够实现前述一个或多个技术方案提供的数据处理方法,例如,如图1A、图1B及图5所示的方法中的一个或多个。
该存储器可为各种类型的存储器,可为随机存储器、只读存储器、闪存等。所述存储器可用于信息存储,例如,存储计算机可执行指令等。所述计算机可执行指令可为各种程序指令,例如,目标程序指令和/或源程序指令等。
所述处理器可为各种类型的处理器,例如,中央处理器、微处理器、数字信号处理器、可编程阵列、数字信号处理器、专用集成电路或图像处理器等。
所述处理器可以通过总线与所述存储器连接。所述总线可为集成电路总线等。
在一些实施例中,所述终端设备还可包括:通信接口,该通信接口可包括:网络接口、例如,局域网接口、收发天线等。所述通信接口同样与所述处理器连接,能够用于信息收发。
在一些实施例中,所述终端设备还包括人机交互接口,例如,所述 人机交互接口可包括各种输入输出设备,例如,键盘、触摸屏等。
本申请实施例提供了一种计算机存储介质,所述计算机存储介质存储有计算机可执行代码;所述计算机可执行代码被执行后,能够实现前述一个或多个技术方案提供的数据处理方法,例如,例如,图1A、图1B及图5所示的方法中的一个或多个。
所述存储介质包括:移动存储设备、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。所述存储介质可为非瞬间存储介质。
本申请实施例提供一种计算机程序产品,所述程序产品包括计算机可执行指令;所述计算机可执行指令被执行后,能够实现前述任意实施提供的数据处理方法,例如,图1A、图1B及图5所示的方法中的一个或多个。
以下结合上述实施例提供几个具体示例:
示例1:
本示例使用深度神经网络预测人体的二维和三维的关键点,再利用三维视觉算法计算人体的三维姿态;具体可包括:
使用2D人体的关键点估计工具预测14个人体的关键点在2D图像中的2D位置;
提取出2D图像对应的深度图像中与14个人体的关键点的实际深度值;
通过三角函数等方式,将对应于实际深度值的2D坐标转换与参考深度值对应的2D坐标;
使用相机内参对所有关键点转换后的2D坐标进行内参归一化操作;
统计归一化后各关键点的均值与标准差,用于更进一步的坐标归一化操作;得到归一化后的2D坐标及参考深度值;
将归一化后的2D坐标及参考深度值,输入深度神经网络,由神经网络进行2D关键点到3D关键点的第一3D特征。基于该第一3D特征可以得到三维视觉算法等,得到3D姿态。例如,基于透视n点定位(PnP)优化基于所述第一3D特征得到3D姿态。
图9所示可为得到本示例提供的3D姿态的一种神经网络,包括:
全连接层(Fc)、批处理+ReLu层及Dropout层;
其中,全连接层获得14个关键点的第一3D特征;输出的是3D姿态。
该神经网络可以用于提取出所述3D姿态。
示例2:
本示例提供一种数据处理方法,包括:
利用深度神经网络获取输入2D图像的若干人体的2D关键点(对应 于2D坐标);
将二维人体关键点进行相机内参归一化后输入到第二个深度神经网络得到相对于人体某一关键点(一般为骨盆处)的相对三维关键点;
最后将所得二维关键点和三维关键点的点序对齐,使用PnP算法,求出三维人体空间位姿。
示例3:对于每一帧3D图像,使用人体的二维关键点检测工具,得到图像上14点的关键点的坐标;
以第一步中得到的二维关键点坐标作为输入,3D关键点的提取网络,获得得到对于的三维人体骨架(17个关键点,其中骨盆处关键点位置固定为0)。
将得到的两个人体关键点模型进行对齐操作,使得每个关键点在物理意义上一致。
已知当前设备的内参K,计算目标人体在相机坐标系下的外参R和T。其中,
Figure PCTCN2019083959-appb-000002
f x,f y,c x,c y可由张正友标定法标定当前设备而来。不妨设对齐后的二维人体骨架S 2和三维人体骨架S 3,那么最优化下列公式即可
Figure PCTCN2019083959-appb-000003
由于使用了一段连续的视频作为输入,所以前一帧的R和T可以用来作为后一帧的初始值。在本申请所提供的几个实施例中,应该理解到,所揭露的设备和方法,可以通过其它的方式实现。以上所描述的设备实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,如:多个单元或组件可以结合,或可以集成到另一个系统,或一些特征可以忽略,或不执行。另外,所显示或讨论的各组成部分相互之间的耦合、或直接耦合、或通信连接可以是通过一些接口,设备或单元的间接耦合或通信连接,可以是电性的、机械的或其它形式的。
上述作为分离部件说明的单元可以是、或也可以不是物理上分开的,作为单元显示的部件可以是、或也可以不是物理单元,即可以位于一个地方,也可以分布到多个网络单元上;可以根据实际的需要选择其中的部分或全部单元来实现本实施例方案的目的。
另外,在本申请各实施例中的各功能单元可以全部集成在一个处理模块中,也可以是各单元分别单独作为一个单元,也可以两个或两个以上单元集成在一个单元中;上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能单元的形式实现。
本领域普通技术人员可以理解:实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成,前述的程序可以存储于一计算机可读取存储介质中,该程序在执行时,执行包括上述方法实施例的 步骤;而前述的存储介质包括:移动存储设备、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (20)

  1. 一种数据处理方法,包括:
    根据参考深度值及图像中目标的关键点的实际深度值,将所述关键点的第一2D坐标的转换为第二2D坐标,其中,所述第二2D坐标和所述参考深度值,构成了所述关键点的第一3D特征;
    基于所述第一3D特征,获得所述目标的3D姿态。
  2. 根据权利要求1所述的方法,其中,
    所述根据参考深度值及图像中目标的关键点的实际深度值,将所述关键点的第一2D坐标的转换为第二2D坐标,包括:
    根据所述实际深度值与所述参考深度值的比值,及所述第一2D坐标,得到所述第二2D坐标。
  3. 根据权利要求2所述的方法,其中,
    所述根据所述实际深度值与所述参考深度值的比值与所述第一2D坐标,得到所述第二2D坐标,包括:
    利用如下函数关系确定所述第二2D坐标;
    X2=(X1*d)/D,
    Y2=(Y1*d)/D,
    其中,X2为所述第二2D坐标在第一方向上的坐标值;X1为所述第一2D坐标在所述第一方向上的坐标值;
    Y2为第二2D坐标在第二方向上的坐标值,Y1为所述第一2D坐标在所述第二方向上的坐标值;其中,所述第二方向垂直于所述第一方向;
    d为所述实际深度值;D为所述参考深度值。
  4. 根据权利要求1至3任一项所述的方法,其中,所述方法还包括:
    光心位置根据所述关键点的第二3D特征及所述图像对应的光心位置,得到所述第一2D坐标;其中,所述第二3D特征包括:基于2D图像得到的第三2D坐标及基于深度图像得到的实际深度值。
  5. 根据权利要求4所述的方法,其中,
    所述根据所述关键点的第二3D特征及所述图像对应的光心位置,得到所述第一2D坐标,包括:
    移动所述关键点的第二3D特征,使得
    所述关键点中基准点的3D特征平移到所述光心位置,并得到各所述关键点的第三3D特征;
    将所述第三3D特征投影到2D成像平面,得到所述第一2D坐标。
  6. 根据权利要求5所述的方法,其中,
    若所述目标为人体骨骼,所述基准点的第一3D特征是基于所述关键点中的两个胯部关键点的第二3D特征确定的。
  7. 根据权利要求5所述的方法,其中
    所述基于所述第一3D特征,获得所述目标的3D姿态,包括:对所述关键点的第二2D坐标对应的深度值减去所述基准点的深度值,得到第四2D坐标及所述第四2D坐标对应的深度值;
    对所述第四2D坐标及所述第四2D坐标对应的深度值进行归一化处理,得到归一化后的所述第四2D坐标及归一化后的所述第四2D坐标对应的深度值;
    利用深度学习模型对归一化后的所述第四2D坐标及归一化后的所述第四2D坐标对应的深度值进行处理,得到所述目标的3D姿态。
  8. 根据权利要求7所述的方法,其中,
    所述对所述第四2D坐标及所述第四2D坐标对应的深度值进行归一化处理,得到归一化后的所述第四2D坐标及归一化后的所述第四2D坐标对应的深度值,包括:
    基于第四2D坐标及所述第四2D坐标对应的深度值,得到所述关键点的坐标均值及方差;
    根据所述坐标均值及方差及所述第四2D坐标及所述第四2D坐标对应的深度值,得到归一化后的第四2D坐标。
  9. 根据权利要求4至7任一项所述的方法,其中,
    所述方法还包括:
    基于所述实际深度值,进行将所述3D姿态投影到二维平面内的迭代运算,以得到与所述第三2D坐标之间距离最小的第五2D坐标;
    根据所述第五2D坐标及所述第一3D特征,获得所述目标的转动参数和平移参数。
  10. 一种数据处理装置,包括:
    第一转换模块,配置为根据参考深度值及图像中目标的关键点的实际深度值,将所述关键点的第一2D坐标的转换为第二2D坐标,其中,所述第二2D坐标和所述参考深度值,构成了所述关键点的第一3D特征;
    第一获得模块,配置为基于所述第一3D特征,获得所述目标的3D姿态。
  11. 根据权利要求10所述的装置,其中,
    所述第一转换模块,配置为根据所述实际深度值与所述参考深度值的比值,及所述第一2D坐标,得到所述第二2D坐标。
  12. 根据权利要求11所述的装置,其中,
    所述第一转换模块,配置为利用如下函数关系确定所述第二2D坐标;
    X2=(X1*d)/D,
    Y2=(Y1*d)/D,
    其中,X2为所述第二2D坐标在第一方向上的坐标值;X1为所述第一2D坐标在所述第一方向上的坐标值;
    Y2为第二2D坐标在第二方向上的坐标值,Y1为所述第一2D坐标在所述第二方向上的坐标值;其中,所述第二方向垂直于所述第一方向;
    d为所述实际深度值;D为所述参考深度值。
  13. 根据权利要求10至12任一项所述的装置,其中,所述装置还包括:
    第二转换模块,配置为光心位置根据所述关键点的第二3D特征及所述图像对应的光心位置,得到所述第一2D坐标;其中,所述第二3D特征包括:基于2D图像得到的第三2D坐标及基于深度图像得到的实际深度值。
  14. 根据权利要求13所述的装置,其中,
    所述第二转换模块,配置为移动所述关键点的第二3D特征,使得所述关键点中基准点的3D特征平移到所述光心位置,并得到各所述关键点的第三3D特征;将所述第三3D特征投影到2D成像平面,得到所述第一2D坐标光心位置。
  15. 根据权利要求14所述的装置,其中,
    若所述目标为人体骨骼,所述基准点的第一3D特征是基于所述关键点中的两个胯部关键点的第二3D特征确定的。
  16. 根据权利要求14所述的装置,其中,
    所述第一获得模块,配置为对所述关键点的第二2D坐标对应的深度值减去所述基准点的深度值,得到第四2D坐标及所述第四2D坐标对应的深度值;对所述第四2D坐标及所述第四2D坐标对应的深度值进行归一化处理,得到归一化后的所述第四2D坐标及归一化后的所述第四2D坐标对应的深度值;利用深度学习模型对归一化后的所述第四2D坐标及归一化后的所述第四2D坐标对应的深度值进行处理,得到所述目标的3D姿态。
  17. 根据权利要求15所述的装置,其中,
    所述第一获得模块,配置为基于第四2D坐标及所述第四2D坐标对应的深度值,得到所述关键点的坐标均值及方差;根据所述坐标均值及方差及所述第四2D坐标及所述第四2D坐标对应的深度值,得到归一化后的第四2D坐标。
  18. 根据权利要求10至17任一项所述的装置,其中,
    所述装置还包括:
    迭代模块,配置为基于所述实际深度值,进行将所述3D姿态投影到二维平面内的迭代运算,以得到与所述第三2D坐标之间距离最小的第五2D坐标;
    第二获得模块,配置为根据所述第五2D坐标及所述第一3D特征,获得所述目标的转动参数和平移参数。
  19. 一种计算机存储介质,所述计算机存储介质存储有计算机可执行代码;所述计算机可执行代码被执行后,能够实现权利要求1至9任一项提供的方法。
  20. 一种电子设备,包括:
    存储器,用于存储信息;
    处理器,与所述存储器连接,用于通过执行存储在所述存储器上的计算机可执行指令,能够实现权利要求1至9任一项提供的方法。
PCT/CN2019/083959 2018-09-18 2019-04-23 数据处理方法及装置、电子设备及存储介质 WO2020057121A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
US17/049,687 US11238273B2 (en) 2018-09-18 2019-04-23 Data processing method and apparatus, electronic device and storage medium
JP2020558429A JP6985532B2 (ja) 2018-09-18 2019-04-23 データ処理方法及び装置、電子機器並びに記憶媒体
SG11202010510XA SG11202010510XA (en) 2018-09-18 2019-04-23 Data processing method and apparatus, electronic device and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811089872.4 2018-09-18
CN201811089872.4A CN110909580B (zh) 2018-09-18 2018-09-18 数据处理方法及装置、电子设备及存储介质

Publications (1)

Publication Number Publication Date
WO2020057121A1 true WO2020057121A1 (zh) 2020-03-26

Family

ID=69812918

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/083959 WO2020057121A1 (zh) 2018-09-18 2019-04-23 数据处理方法及装置、电子设备及存储介质

Country Status (5)

Country Link
US (1) US11238273B2 (zh)
JP (1) JP6985532B2 (zh)
CN (1) CN110909580B (zh)
SG (1) SG11202010510XA (zh)
WO (1) WO2020057121A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111582204A (zh) * 2020-05-13 2020-08-25 北京市商汤科技开发有限公司 姿态检测方法、装置、计算机设备及存储介质
CN113483661A (zh) * 2021-07-06 2021-10-08 广东南方数码科技股份有限公司 一种点云数据获取方法、装置、设备及存储介质

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111353930B (zh) * 2018-12-21 2022-05-24 北京市商汤科技开发有限公司 数据处理方法及装置、电子设备及存储介质
CN109840500B (zh) * 2019-01-31 2021-07-02 深圳市商汤科技有限公司 一种三维人体姿态信息检测方法及装置
US20210312236A1 (en) * 2020-03-30 2021-10-07 Cherry Labs, Inc. System and method for efficient machine learning model training
CN113808227B (zh) * 2020-06-12 2023-08-25 杭州普健医疗科技有限公司 一种医学影像对齐方法、介质及电子设备
US11488325B2 (en) * 2020-06-17 2022-11-01 Microsoft Technology Licensing, Llc Auto calibrating a single camera from detectable objects
CN111985384A (zh) * 2020-08-14 2020-11-24 深圳地平线机器人科技有限公司 获取脸部关键点的3d坐标及3d脸部模型的方法和装置

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120139907A1 (en) * 2010-12-06 2012-06-07 Samsung Electronics Co., Ltd. 3 dimensional (3d) display system of responding to user motion and user interface for the 3d display system
CN102800126A (zh) * 2012-07-04 2012-11-28 浙江大学 基于多模态融合的实时人体三维姿态恢复的方法
CN103037226A (zh) * 2011-09-30 2013-04-10 联咏科技股份有限公司 深度融合方法及其装置
CN104243948A (zh) * 2013-12-20 2014-12-24 深圳深讯和科技有限公司 2d图像转3d图像的深度调整方法及装置

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2751777B1 (en) * 2011-08-31 2019-08-07 Apple Inc. Method for estimating a camera motion and for determining a three-dimensional model of a real environment
JP6433149B2 (ja) 2013-07-30 2018-12-05 キヤノン株式会社 姿勢推定装置、姿勢推定方法およびプログラム
US9275078B2 (en) * 2013-09-05 2016-03-01 Ebay Inc. Estimating depth from a single image
CN104881881B (zh) * 2014-02-27 2018-04-10 株式会社理光 运动对象表示方法及其装置
JP5928748B2 (ja) * 2014-07-31 2016-06-01 インターナショナル・ビジネス・マシーンズ・コーポレーションInternational Business Machines Corporation 同一種類の複数の認識対象物体が検索対象画像中に存在する場合に、それぞれの認識対象物体の位置および向きを精度良く求める手法
JP2017097578A (ja) 2015-11-24 2017-06-01 キヤノン株式会社 情報処理装置及び方法
WO2018087933A1 (ja) 2016-11-14 2018-05-17 富士通株式会社 情報処理装置、情報処理方法、およびプログラム
US10277889B2 (en) * 2016-12-27 2019-04-30 Qualcomm Incorporated Method and system for depth estimation based upon object magnification
CN108230383B (zh) * 2017-03-29 2021-03-23 北京市商汤科技开发有限公司 手部三维数据确定方法、装置及电子设备
US10929654B2 (en) * 2018-03-12 2021-02-23 Nvidia Corporation Three-dimensional (3D) pose estimation from a monocular camera

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120139907A1 (en) * 2010-12-06 2012-06-07 Samsung Electronics Co., Ltd. 3 dimensional (3d) display system of responding to user motion and user interface for the 3d display system
CN103037226A (zh) * 2011-09-30 2013-04-10 联咏科技股份有限公司 深度融合方法及其装置
CN102800126A (zh) * 2012-07-04 2012-11-28 浙江大学 基于多模态融合的实时人体三维姿态恢复的方法
CN104243948A (zh) * 2013-12-20 2014-12-24 深圳深讯和科技有限公司 2d图像转3d图像的深度调整方法及装置

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111582204A (zh) * 2020-05-13 2020-08-25 北京市商汤科技开发有限公司 姿态检测方法、装置、计算机设备及存储介质
CN113483661A (zh) * 2021-07-06 2021-10-08 广东南方数码科技股份有限公司 一种点云数据获取方法、装置、设备及存储介质

Also Published As

Publication number Publication date
JP2021513175A (ja) 2021-05-20
CN110909580A (zh) 2020-03-24
CN110909580B (zh) 2022-06-10
US11238273B2 (en) 2022-02-01
US20210240971A1 (en) 2021-08-05
JP6985532B2 (ja) 2021-12-22
SG11202010510XA (en) 2020-11-27

Similar Documents

Publication Publication Date Title
WO2020057121A1 (zh) 数据处理方法及装置、电子设备及存储介质
EP3786890B1 (en) Method and apparatus for determining pose of image capture device, and storage medium therefor
WO2019161813A1 (zh) 动态场景的三维重建方法以及装置和系统、服务器、介质
CN110582798B (zh) 用于虚拟增强视觉同时定位和地图构建的系统和方法
US10507002B2 (en) X-ray system and method for standing subject
JP5158223B2 (ja) 三次元モデリング装置、三次元モデリング方法、ならびに、プログラム
JP5018980B2 (ja) 撮像装置、長さ測定方法、及びプログラム
WO2021043213A1 (zh) 标定方法、装置、航拍设备和存储介质
CN113256718B (zh) 定位方法和装置、设备及存储介质
CN113205560B (zh) 多深度相机的标定方法、装置、设备及存储介质
JP5263437B2 (ja) 三次元モデリング装置、三次元モデリング方法、ならびに、プログラム
WO2022012019A1 (zh) 身高测量方法、身高测量装置和终端
CN113361365A (zh) 定位方法和装置、设备及存储介质
WO2020110359A1 (en) System and method for estimating pose of robot, robot, and storage medium
JP7121936B2 (ja) カメラ校正情報取得装置、画像処理装置、カメラ校正情報取得方法およびプログラム
JP2006195790A (ja) レンズ歪推定装置、レンズ歪推定方法、及びレンズ歪推定プログラム
JP2000348181A (ja) 移動物体追跡装置
CN114608521A (zh) 单目测距方法及装置、电子设备和存储介质
JP2022092528A (ja) 三次元人物姿勢推定装置、方法およびプログラム
CN117275089A (zh) 一种单目摄像头的人物识别方法、装置、设备及存储介质
CN110909581B (zh) 数据处理方法及装置、电子设备及存储介质
CN116934829B (zh) 无人机目标深度估计的方法、装置、存储介质及电子设备
CN113822174B (zh) 视线估计的方法、电子设备及存储介质
US20230290101A1 (en) Data processing method and apparatus, electronic device, and computer-readable storage medium
JP7096910B2 (ja) データ処理方法及び装置、電子機器並びに記憶媒体

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19863323

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2020558429

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19863323

Country of ref document: EP

Kind code of ref document: A1