WO2022012019A1 - 身高测量方法、身高测量装置和终端 - Google Patents

身高测量方法、身高测量装置和终端 Download PDF

Info

Publication number
WO2022012019A1
WO2022012019A1 PCT/CN2021/073455 CN2021073455W WO2022012019A1 WO 2022012019 A1 WO2022012019 A1 WO 2022012019A1 CN 2021073455 W CN2021073455 W CN 2021073455W WO 2022012019 A1 WO2022012019 A1 WO 2022012019A1
Authority
WO
WIPO (PCT)
Prior art keywords
target object
key points
bone
information
image
Prior art date
Application number
PCT/CN2021/073455
Other languages
English (en)
French (fr)
Inventor
方伟
苏琪
吴亚飞
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to KR1020237004401A priority Critical patent/KR20230035382A/ko
Priority to JP2023501759A priority patent/JP2023534664A/ja
Publication of WO2022012019A1 publication Critical patent/WO2022012019A1/zh
Priority to US18/154,508 priority patent/US20230152084A1/en

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01BMEASURING LENGTH, THICKNESS OR SIMILAR LINEAR DIMENSIONS; MEASURING ANGLES; MEASURING AREAS; MEASURING IRREGULARITIES OF SURFACES OR CONTOURS
    • G01B11/00Measuring arrangements characterised by the use of optical techniques
    • G01B11/02Measuring arrangements characterised by the use of optical techniques for measuring length, width or thickness
    • G01B11/022Measuring arrangements characterised by the use of optical techniques for measuring length, width or thickness by means of tv-camera scanning
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01BMEASURING LENGTH, THICKNESS OR SIMILAR LINEAR DIMENSIONS; MEASURING ANGLES; MEASURING AREAS; MEASURING IRREGULARITIES OF SURFACES OR CONTOURS
    • G01B11/00Measuring arrangements characterised by the use of optical techniques
    • G01B11/02Measuring arrangements characterised by the use of optical techniques for measuring length, width or thickness
    • G01B11/06Measuring arrangements characterised by the use of optical techniques for measuring length, width or thickness for measuring thickness ; e.g. of sheet material
    • G01B11/0608Height gauges
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/60Analysis of geometric attributes
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/103Detecting, measuring or recording devices for testing the shape, pattern, colour, size or movement of the body or parts thereof, for diagnostic purposes
    • A61B5/107Measuring physical dimensions, e.g. size of the entire body or parts thereof
    • A61B5/1072Measuring physical dimensions, e.g. size of the entire body or parts thereof measuring distances on the body, e.g. measuring length, height or thickness
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/103Detecting, measuring or recording devices for testing the shape, pattern, colour, size or movement of the body or parts thereof, for diagnostic purposes
    • A61B5/107Measuring physical dimensions, e.g. size of the entire body or parts thereof
    • A61B5/1079Measuring physical dimensions, e.g. size of the entire body or parts thereof using optical or photographic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • G06T7/75Determining position or orientation of objects or cameras using feature-based methods involving models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/64Three-dimensional objects
    • G06V20/647Three-dimensional objects by matching two-dimensional images to three-dimensional objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • G06T2207/10012Stereo images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20036Morphological image processing
    • G06T2207/20044Skeletonization; Medial axis transform
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face

Definitions

  • the present application relates to the technical field of image processing, and in particular, to a height measurement method, a height measurement device and a terminal.
  • Height is an important part of the basic data of the human body, and it has always attracted much attention. How to quickly and accurately obtain the height data of the measured object, and how to obtain the height data of multiple measured objects at the same time, has always been a hot topic of exploration in related fields.
  • the traditional height measurement method requires the measurement object to be in a standing position, with the help of a standard scale, or the use of infrared or ultrasonic reflection to obtain height data, which can only be measured one by one, and the posture requirements are strict. If the standing posture is not standard, the height data will be Inaccurate.
  • the height of the measurement object can be obtained by scaling the reference object in equal proportions.
  • a virtual ruler is displayed on the terminal preview interface, and the distance between the terminal device and the measured object is obtained through the distance sensor of the terminal. distance between objects. According to the corresponding relationship between the preset distance value and the scale, the height of the measured object is estimated.
  • the measurement result of the height measurement method is obtained by proportionally magnifying the distance between the terminal and the measurement object and the virtual scale, the resolution of the terminal device and the accuracy of the distance sensor of the terminal device will affect the accuracy of the measurement.
  • the surrounding environment of the measurement object is cluttered, the accuracy of height measurement results is low.
  • the embodiment of the present application provides a height measurement method, which is used to measure the height of a target object, which can improve the accuracy of the measurement result.
  • a first aspect of the embodiments of the present application provides a height measurement method, including: acquiring an image including a target object and a pose of a camera when capturing the image; acquiring at least two bones of the target object in the image
  • the pixel coordinates of the key points, the bone key points include bone joint points, and the pixel coordinates are used to represent the two-dimensional position information of the bone key points in the image; according to the pose of the camera and the bone
  • the pixel coordinates of the key points obtain the three-dimensional coordinates of the at least two bone key points, the three-dimensional coordinates are used to represent the three-dimensional position information of the bone key points in the coordinate system, the three-dimensional coordinates of the at least two bone key points
  • the coordinates are used to represent the distance information between the at least two skeleton key points; the height data of the target object is determined according to the three-dimensional coordinates of the at least two skeleton key points.
  • the height measurement method provided by the embodiment of the present application can detect a two-dimensional image obtained by shooting a target object by means of a bone detection algorithm, etc., can obtain the pixel coordinates of the key points of the bone in the image, and based on the camera pose corresponding to the two-dimensional image, can be The pixel coordinates of the skeleton key points are converted into 3D coordinates in the 3D space, and the 3D coordinates correspond to the position information of the skeleton key points in the real world, so that the height data of the target object can be directly obtained.
  • the height data of the target object can be obtained without contact through the captured two-dimensional image of the target object.
  • there is no need for a height reference object in the shooting scene which can reduce errors and improve measurement accuracy.
  • determining the height data of the target object according to the three-dimensional coordinates of the at least two skeleton key points specifically includes: acquiring at least three of the target objects in the image The pixel coordinates of the skeleton key points; according to the pose of the camera and the pixel coordinates of the at least three skeleton key points, the 3D coordinates of the at least three skeleton key points are obtained, and the 3D coordinates are used to represent the bones
  • the three-dimensional position information of the key points in the coordinate system, the three-dimensional coordinates of the at least three bone key points are used to represent the distance information between the at least three bone key points;
  • the coordinates determine at least two bone distances, and the height data of the target object is determined according to the at least two bone distances.
  • the coordinate system includes a world coordinate system.
  • the method further includes: acquiring the three-dimensional point cloud information of the target object; acquiring, according to the pose of the camera and the pixel coordinates of the skeleton key points, acquiring The three-dimensional coordinates of the at least two skeleton key points of the target object specifically include: obtaining at least two skeletons through a collision detection algorithm according to the pixel coordinates of the skeleton key points, the pose of the camera, and the three-dimensional point cloud information.
  • the 3D coordinates of the keypoint is acquiring the three-dimensional point cloud information of the target object.
  • This method provides a specific conversion scheme for converting the pixel coordinates of bone key points into three-dimensional coordinates of bone key points, that is, a specific conversion scheme from two-dimensional information to three-dimensional information, and obtains the three-dimensional coordinates of bone key points based on three-dimensional point cloud information and collision detection algorithm. , which can improve the accuracy of the three-dimensional coordinates compared to the direct calculation through the pose of the camera.
  • the acquiring the 3D point cloud information of the target object specifically includes: acquiring the 3D point cloud of the target object according to at least two images of the target object taken from different directions information.
  • This method provides a specific method for obtaining 3D point cloud information, that is, through the acquisition of multiple images of the target object, based on the feature point detection and matching of various images of the target object, the 3D point cloud information of the target object can be obtained.
  • the 3D point cloud information is obtained based on the information of multiple images, which contains more information than a single image and can improve the accuracy of 3D coordinates.
  • the acquiring the 3D point cloud information of the target object specifically includes: acquiring the 3D point cloud information of the target object collected by a depth sensor, where the depth sensor includes a binocular Cameras, lidars, millimeter-wave radars, or time of flight (TOF) sensors.
  • the depth sensor includes a binocular Cameras, lidars, millimeter-wave radars, or time of flight (TOF) sensors.
  • This method provides another specific method for obtaining 3D point cloud information.
  • the 3D point cloud information is collected by the depth sensor. Since the 3D point cloud obtained by the depth sensor can be a dense point cloud, it can contain more abundant information. With 3D point cloud, the obtained 3D coordinates of skeleton key points are more accurate.
  • the acquiring the image of the target object and the pose of the camera when shooting the image specifically includes: acquiring at least two images of the target object from different orientations, the at least The two images of the target object shot from different orientations include the images; the pose of the camera is acquired according to the at least two images of the target object shot from different orientations.
  • the method provides a specific way to obtain the pose of the camera, that is, to obtain at least two images of the target object shot from different directions, and to estimate the pose of the camera when shooting the images through feature point detection and feature point matching.
  • the acquiring the image of the target object and the pose of the camera when shooting the image specifically includes: acquiring at least two images of the target object from different orientations, the at least The two images of the target object from different orientations include the images of the target object; the inertial measurement unit data of the camera corresponding to the at least two images of the target object from different orientations are acquired; according to the at least two images from different orientations The image of the target object and the inertial measurement unit data are taken to determine the pose of the camera.
  • the method provides a specific way to obtain the pose of the camera.
  • inertial measurement unit data can also be collected, which can improve the accuracy of calculating the pose of the camera.
  • the determining the height data of the target object according to the three-dimensional coordinates of the at least two skeleton key points specifically includes: according to the three-dimensional coordinates of the at least two skeleton key points , obtain the bone length of the target object and the posture information of the target object; determine the preset weight parameter of the bone length according to the posture information; determine the target object according to the bone length and the weight parameter height data.
  • the bone length includes the bone length of the head and the bone length of the leg; the height of the target object is determined according to the bone length and the weight parameter
  • the data specifically includes: determining the head height compensation value according to the bone length of the head and preset head compensation parameters; determining the foot height compensation according to the bone length of the legs and the preset foot compensation parameters value; according to the bone length information, the weight parameter, the head height compensation value and the foot height compensation value, determine the height data of the target object.
  • the height measurement method provided by this method introduces the compensation of the head and the foot, which can further improve the accuracy of the height measurement.
  • the method further includes: performing face detection on the image, and acquiring head height data of the target object, where the head height data is used to The pixel coordinates of the bone key points corresponding to the head in the two-dimensional bone key point information are corrected.
  • the height measurement method provided by the method can also obtain the head height data through face detection, and correct the pixel coordinates of the key points of the skeleton, so as to improve the measurement accuracy.
  • the image includes at least two target objects; the method further includes: performing face detection on the image, and extracting from the skeleton key points based on an image segmentation algorithm The pixel coordinates of the skeleton key points of each of the at least two target objects are determined from the pixel coordinates of the at least two target objects.
  • the height measurement method provided by the method can measure the height of multiple target objects in the image, which can simplify the operation and improve the measurement efficiency compared with the height detection performed one by one in the prior art.
  • the method further includes: displaying information of the at least two target objects to the user, where the information of the at least two target objects includes at least one of the following: the at least two target objects The image information of the two target objects, the image information marked with the pixel coordinates of the skeleton key points of the at least two target objects, and the face detection result information of the at least two target objects; obtain user instructions, the user instructions for instructing height measurement to be performed on one or more of the at least two target subjects.
  • the height measurement method provided by this method can also interact with the user, and according to the user's instruction, select the object whose height the user wants to detect from the target objects included in the image, so as to improve the user experience.
  • the skeleton key points are arranged along the direction of gravity, and the skeleton key points arranged according to the direction of gravity help to improve the accuracy of height measurement.
  • the target object is in a non-standing posture
  • the non-standing posture includes a sitting posture, a lying posture, and a kneeling posture.
  • the implementation manner of the present application can also measure the height of the target object.
  • the determining the height data of the target object according to the three-dimensional coordinates of the at least two skeleton key points specifically includes: according to the three-dimensional coordinates of the at least two skeleton key points , obtain the bone length information of the target object; delete the bone length information that satisfies the first preset condition, the first preset condition includes the bone length information whose bone length does not belong to the preset range, or the bone length difference of the symmetrical part is greater than or equal to the preset threshold range; the height data of the target object is determined according to the deleted bone length information.
  • the height measurement method provided by the method can also delete abnormal data to improve the accuracy of the measurement result.
  • the bones in the left and right symmetrical parts can be verified. For example, the difference in the length of the bones corresponding to the left leg and the right leg should be small. If the difference is greater than a threshold, abnormal data can be deleted.
  • the method further includes: marking the height data of the target object near the target object in the image and displaying it to the user; Height data of the target object.
  • the height measurement method provided by this method can mark the height of the target object in the real-time displayed image, provide instant feedback, and improve user experience.
  • the method further includes: if the skeleton key points of the target object do not meet the second preset condition, displaying the detection failure information to the user, or prompting the user to detect by voice A failed message, or a message that vibrates to prompt the user to detect a failure.
  • the height measurement method provided by this method can give feedback to the user when the detection fails, so as to improve the user experience.
  • a second aspect of the embodiments of the present application provides a height measurement device, including: an acquisition module, configured to acquire an image including a target object and the pose of the camera when the image was captured; the acquisition module, further configured to acquire The pixel coordinates of at least two skeleton key points of the target object in the image, the skeleton key points include skeleton joint points, and the pixel coordinates are used to represent the two-dimensional position of the skeleton key points in the image information; the acquisition module is further configured to acquire the three-dimensional coordinates of the at least two bone key points according to the pose of the camera and the pixel coordinates of the bone key points, where the three-dimensional coordinates are used to represent the bones Three-dimensional position information of the key points in the coordinate system, the three-dimensional coordinates of the at least two skeleton key points are used to represent the distance information between the at least two skeleton key points; The three-dimensional coordinates of the skeleton key points determine the height data of the target object.
  • the determining module is specifically configured to: acquire pixel coordinates of at least three skeleton key points of the target object in the image; The pixel coordinates of the at least three skeleton key points, obtain the three-dimensional coordinates of the at least three skeleton key points, the three-dimensional coordinates are used to represent the three-dimensional position information of the skeleton key points in the coordinate system, the at least three The three-dimensional coordinates of the skeleton key points are used to represent the distance information between the at least three skeleton key points;
  • the determining module is specifically configured to: determine at least two bone distances according to the three-dimensional coordinates of the at least three bone key points, and determine the height data of the target object according to the at least two bone distances.
  • the coordinate system includes a world coordinate system.
  • the acquiring module is further configured to acquire the three-dimensional point cloud information of the target object; , acquiring the three-dimensional coordinates of at least two skeleton key points of the target object specifically includes: obtaining at least two skeleton key points through a collision detection algorithm according to the pixel coordinates of the skeleton key points, the pose of the camera, and the 3D point cloud information. The 3D coordinates of each bone key point.
  • the obtaining module is specifically configured to: obtain the three-dimensional point cloud information of the target object according to at least two images of the target object taken from different directions.
  • the acquisition module is specifically configured to: acquire the three-dimensional point cloud information of the target object collected by a depth sensor, where the depth sensor includes a binocular camera, a lidar, a millimeter wave Radar or time-of-flight sensor.
  • the acquiring module is specifically configured to: acquire at least two images of the target object shot from different orientations, and the at least two images of the target object shot from different orientations include the an image; obtaining the pose of the camera according to the at least two images of the target object shot from different directions.
  • the acquiring module is specifically configured to: acquire at least two images of the target object shot from different orientations, and the at least two images of the target object shot from different orientations include the an image of the target object; acquiring the inertial measurement unit data of the camera corresponding to the at least two images of the target object taken from different orientations; determining according to the at least two images of the target object taken from different orientations and the inertial measurement unit data The pose of the camera.
  • the determining module is specifically configured to: acquire the bone length of the target object and the posture information of the target object according to the three-dimensional coordinates of the at least two skeleton key points ; According to the posture information, determine the weight parameter of the preset bone length; According to the bone length and the weight parameter, determine the height data of the target object.
  • the bone length includes the bone length of the head and the bone length of the leg; the determining module is specifically configured to: according to the bone length of the head and a preset head compensation parameter, determine the head height compensation value; according to the bone length of the leg and preset foot compensation parameters, determine the foot height compensation value; according to the bone length information, the weight parameter, the The head height compensation value and the foot height compensation value determine the height data of the target object.
  • the image includes at least two target objects; the device further includes: a processing module, configured to perform face detection on the image, and based on an image segmentation algorithm The pixel coordinates of the skeleton key points of each of the at least two target objects are determined from the pixel coordinates of the skeleton key points.
  • the device further includes: an output module, configured to display the information of the at least two target objects to the user, where the information of the at least two target objects includes at least one of the following type: the image information of the at least two target objects, the image information marked with the pixel coordinates of the skeleton key points of the at least two target objects, and the face detection result information of the at least two target objects; the acquiring The module is further configured to acquire a user instruction, where the user instruction is used to instruct one or more of the at least two target objects to perform height measurement.
  • the skeleton key points are arranged along the direction of gravity, and the skeleton key points arranged according to the direction of gravity help to improve the accuracy of height measurement.
  • the target object is in a non-standing posture
  • the non-standing posture includes a sitting posture, a lying posture, and a kneeling posture.
  • the implementation manner of the present application can also measure the height of the target object.
  • the determining module is specifically configured to: acquire the bone length information of the target object according to the three-dimensional coordinates of the at least two skeleton key points; the deletion satisfies the first preset condition
  • the first preset condition includes the bone length information whose bone length does not belong to the preset range, or the bone length difference of the symmetrical part is greater than or equal to the preset threshold range; Describe the height data of the target object.
  • the device further includes an output module, configured to: mark the height data of the target object near the target object in the image and display it to the user; or , the voice broadcasts the height data of the target object.
  • the device further includes an output module, configured to: if the skeleton key points of the target object do not meet the second preset condition, display detection failure information to the user, Or the voice prompts the user to fail the test, or the vibration prompts the user to fail the test.
  • a third aspect of an embodiment of the present application provides a terminal, including: one or more processors and a memory; wherein, computer-readable instructions are stored in the memory; the one or more processors read the memory
  • the computer-readable instructions in are to cause the terminal to implement the method according to any one of the above-mentioned first aspect and various possible implementation manners.
  • a fourth aspect of the embodiments of the present application provides a computer program product containing instructions, characterized in that, when it runs on a computer, the computer is caused to execute any one of the above-mentioned first aspect and various possible implementation manners method described in item.
  • a fifth aspect of the embodiments of the present application provides a computer-readable storage medium, including instructions, characterized in that, when the instructions are executed on a computer, the computer is made to execute the above-mentioned first aspect and various possible implementation manners. The method of any one.
  • a sixth aspect of the embodiments of the present application provides a chip, including a processor.
  • the processor is configured to read and execute the computer program stored in the memory to perform the method in any possible implementation manner of any of the above aspects.
  • the chip includes a memory, and the memory and the processor are connected to the memory through a circuit or a wire.
  • the chip further includes a communication interface, and the processor is connected to the communication interface.
  • the communication interface is used to receive the data and/or information to be processed, and the processor obtains the data and/or information from the communication interface, processes the data and/or information, and outputs the processing result through the communication interface.
  • the communication interface may be an input-output interface.
  • the embodiments of the present application have the following advantages:
  • the image of the target object and the pose of the camera when the image was captured can be obtained, the bones of the image can be detected, the pixel coordinates of at least two key points of the bones of the target object in the image can be obtained, and then Convert the pixel coordinates of the skeleton key points to the 3D space according to the camera pose to obtain the 3D coordinates of at least two skeleton key points, and finally determine the height data of the target object according to the 3D coordinates of the at least two skeleton key points.
  • the method converts the two-dimensional pixel coordinates of the skeleton key points into three-dimensional coordinates, and directly obtains the height data of the target object without the reference object conversion, which can avoid the measurement error caused by the reference object conversion when the scene around the target object is complex. It can improve the accuracy of height measurement results.
  • the height measurement method provided by the embodiments of the present application can be applied to the target object for height measurement in various postures.
  • Fig. 1 is an embodiment schematic diagram of height measurement
  • FIG. 2a is a schematic diagram of an embodiment of an application scenario of the height measurement method in the embodiment of the present application
  • 2b is a schematic diagram of an embodiment of an application scenario of the height measurement method in the embodiment of the present application.
  • FIG. 3 is a schematic diagram of an embodiment of the height measurement method in the embodiment of the application.
  • FIG. 4 is a schematic diagram of another embodiment of the height measurement method in the embodiment of the application.
  • FIG. 5 is a schematic diagram of the conversion of two-dimensional bone key points into three-dimensional bone key points in an embodiment of the present application
  • FIG. 6 is a schematic diagram of a height measurement method in a standing posture in an embodiment of the application.
  • FIG. 7 is a schematic diagram of a height measurement method in a sitting position in the embodiment of the application.
  • FIG. 8 is a schematic diagram of an application scenario of the height measurement method in the embodiment of the present application.
  • 9a is a schematic diagram of a point cloud of a SLAM system in an embodiment of the present application.
  • FIG. 9b is a schematic diagram of a two-dimensional skeleton key point detection result in an embodiment of the present application.
  • Figure 9c is a schematic diagram of height detection when measuring at different angles in the embodiment of the application.
  • FIG. 10 is a schematic diagram of an embodiment of the height measuring device in the embodiment of the application.
  • Fig. 11a is a schematic diagram of another embodiment of the height measuring device in the embodiment of the present application.
  • Fig. 11b is a schematic diagram of another embodiment of the height measuring device in the embodiment of the application.
  • FIG. 12 is a schematic diagram of an embodiment of a terminal in an embodiment of the present application.
  • the embodiment of the present application provides a height measurement method, which is used for height measurement of a target object in various postures, which can improve the accuracy of height data.
  • Human bone key point detection Pose Estimation, which mainly detects some key points of the human body, such as joints, facial features, etc., and describes the human bone information through key points. Skeletal keys are also known as skeletal nodes or joints.
  • In-camera parameters are parameters related to the characteristics of the camera itself, including the focal length, pixel size, etc. of the camera; for the configuration of electronic devices equipped with cameras, the in-camera parameters are generally known.
  • Camera extrinsic parameters are parameters in the world coordinate system, including the camera's position and rotation.
  • the two-dimensional pixels in the image captured by the camera correspond to the three-dimensional coordinates in the world coordinate system.
  • the pose of the camera includes 6 degrees of freedom (dof), of which 3 position-related degrees of freedom are used to determine the camera's position in three-dimensional space, and three rotation angle-related degrees of freedom are used to determine the camera's position in three-dimensional space.
  • Rotation pose in The pose of the camera corresponds to the moment when the image was taken, the position and pose of the camera in the world coordinate system.
  • the target to be photographed does not move and the camera moves; it may also be that the target to be photographed moves but the camera does not move; it may also be that both the target and the camera are moving, and there is a relative pose change between the two.
  • the target object to be measured may be a vertebrate.
  • the embodiment of the present application takes a human as an example for introduction.
  • Scenario 1 In augmented reality (AR) or virtual reality (VR) applications, height measurement can be performed through a smart terminal device, for example, as shown in Figure 2a, a smartphone is used to scan the measured object (or The surrounding environment is called the target object, the measured target, etc., which can be referred to as the target), and the camera pose is estimated through the simultaneous positioning and mapping (simultaneous localization and mapping, SLAM) system, and the three-dimensional (3D) point cloud data of the surrounding environment of the measured object is obtained.
  • SLAM simultaneous positioning and mapping
  • 3D three-dimensional
  • the skeleton key points of at least two skeleton key points in the three-dimensional space are obtained.
  • Three-dimensional coordinates integrate the three-dimensional information of the key points of the skeleton, output the height data of one or more measured objects, and realize the height measurement under multi-user and multi-pose.
  • Height data can also be superimposed near the subject in the image and output through the smartphone's display.
  • the height measurement method is introduced by taking Scenario 1 as an example.
  • Scenario 2 As shown in Figure 2b, the image acquisition device is fixed, and the object to be measured walks through a predetermined position for image acquisition. Since the position of the camera in the world coordinate system is known, bone detection is performed on the captured image to obtain 2D bones. After the key points are converted into 3D skeleton key points, the height data of the measured object can be output through data integration and calculation.
  • FIG. 3 a schematic diagram of an embodiment of the height measurement method in the embodiment of the present application.
  • the height measurement device may be a terminal, and the terminal may acquire an image of the target through an image acquisition device such as a camera.
  • the camera may be configured with a common monocular camera or a binocular camera, which is not limited here.
  • the camera may be a component built into the terminal, or may be a device other than the terminal.
  • image data can be transmitted to the terminal. It should be noted that the internal parameters of the camera are known.
  • the terminal also obtains the pose of the camera corresponding to the image.
  • the terminal captures at least two images of the target from different orientations through a monocular camera, and calculates the pose of the camera by detecting feature points with the same name in the images. Or, obtain the camera pose according to the target shot by the binocular camera.
  • Inertial measurement unit IMU is a device that measures the three-axis attitude angle (or angular rate) and acceleration of an object.
  • the terminal includes an IMU and a camera for collecting an image of the target
  • the pose of the camera can be obtained according to the IMU data in the process of collecting the image by the camera.
  • the pose of the camera is calculated according to the images of at least two targets and the IMU data when the images are collected. It can be understood that the pose of the camera obtained is more accurate based on the images of multiple targets and the IMU data. .
  • the image of the target may include one or more objects to be measured. 302. Obtain the pixel coordinates of the bone key points of the target in the image;
  • Bone key points include bone joint points, which can be used to identify bone key points in the image through various existing bone detection algorithms, and obtain the pixel coordinates of at least two bone key points of the target in the image, and the pixel coordinates can be used to represent bone key points.
  • Two-dimensional position information in the image, the pixel coordinates (u, v) indicate the position of the point in the image.
  • the skeleton detection algorithm can detect skeleton key points.
  • skeleton key point detection algorithms such as: based on RMPE (regional multi-person pose estimation) algorithm, DeepCut algorithm, etc.
  • the number of skeleton key points can be 14 or 21, for example.
  • the two-dimensional bone key point information of each object to be measured may be acquired separately.
  • the two-dimensional bone key point information includes the pixel coordinates of each bone key point in the image, and also includes the identification of each bone key point.
  • the target object can be in a standing posture, and the standing posture refers to that in this posture, all the key points of the skeleton of the target object are arranged in the direction of gravity, or, in a vertical arrangement, the bones are arranged in the direction of gravity or longitudinally. Key points help improve the accuracy of height measurements.
  • the target object can be in a non-standing posture, which means that in this posture, the pixel coordinates of some key points of the skeleton of the target object are not arranged in the direction of gravity or longitudinally, that is, the bones in the non-standing posture.
  • the pixel coordinates of key points are not all arranged in a vertical line.
  • Non-standing positions include sitting, lying, kneeling, or other positions. This scheme can also measure the height when the target object is in a non-standing posture.
  • the pixel coordinates of the two-dimensional bone key points in the image can be converted into three-dimensional coordinates in the world coordinate system, and the three-dimensional coordinates of at least two bone key points can be obtained. It is used to represent the three-dimensional position information of the bone key points in the world coordinate system, and the three-dimensional coordinates are (x, y, z), for example.
  • the identification of each bone key point can also be obtained.
  • the three-dimensional coordinates of the at least two bone key points can be used to represent the distance information between the at least two bone key points.
  • the three-dimensional coordinates of the first bone key point are (x1, y1, z1)
  • the three-dimensional coordinates of the second bone key point are The coordinates are (x2, y2, z2), and the distance between the key point of the first bone and the key point of the second bone in the world coordinate system can be calculated.
  • the length of the bone can be calculated based on the three-dimensional coordinates of the two bone key points, that is, the distance between at least two bone key points.
  • the information includes information on the length of the bones, which can be used to calculate the height of the target.
  • the bone length can be obtained according to the three-dimensional coordinates of at least two bone key points.
  • a bone length can be obtained by calculating the three-dimensional coordinates of two associated bone key points.
  • at least two bone distances are determined according to the three-dimensional coordinates of at least three bone key points, and the target's height data can be obtained according to the at least two bone distances by performing splicing calculation based on the bone length information of the target's bone structure.
  • the length of a bone can be calculated by the Euclidean distance in three-dimensional space between the 3D coordinates of the two joint points constituting the bone.
  • the identification of the bone corresponding to each bone length can also be obtained.
  • the identifier of the bone may be the type of the human torso corresponding to the bone (such as "arm", "leg”, etc.), which is used to indicate different bones.
  • the key point of the bone identified as the right shoulder and the key point of the bone identified as the right elbow can jointly form the bone identified as the right upper arm.
  • the bone splicing algorithm is used to obtain height data according to the length of the bones. There are various specific calculation methods, which are not limited here.
  • the pixel coordinates of the key bone nodes of the target in the image are detected, and then the pixel coordinates of the key bone points are converted into three-dimensional space according to the camera pose to obtain the three-dimensional coordinates of the key points of the bone.
  • the three-dimensional coordinates of the at least two skeleton key points determine the height data of the target.
  • This method converts the two-dimensional pixel coordinates of the key points of the skeleton into three-dimensional coordinates, and directly obtains the height data of the target without the reference object transformation, which can avoid the measurement error caused by the reference object transformation when the scene around the target is complex, and can improve the Height measurement accuracy.
  • FIG. 4 is a schematic diagram of another embodiment of the height measurement method in the embodiment of the present application.
  • the terminal acquires at least two images of the target, and the images of the at least two targets are captured by the camera in different poses,
  • the IMU data when the images of the at least two targets are captured may be acquired simultaneously. Since the poses of the cameras are different when the images of the at least two targets are captured, the IMU data may indicate the moving direction and moving distance of the camera.
  • the image may include one or more objects whose height is to be measured.
  • the pose of the camera can be calculated by detecting pairs of feature points with the same name in the images.
  • the pose of the camera is obtained according to the IMU data in the process of capturing images by the camera.
  • the pose of the camera is calculated according to the images of at least two targets and the IMU data when the images are collected. It can be understood that the pose of the camera obtained is more accurate based on the images of multiple targets and the IMU data.
  • the terminal may acquire the pose of the camera corresponding to any one of the images of the at least two targets.
  • the terminal acquires three-dimensional point cloud information, and the three-dimensional point cloud information includes the three-dimensional coordinates of the visible part of the target in the coordinate system.
  • the coordinate system includes a world coordinate system.
  • the acquisition method of the three-dimensional point cloud information includes: lidar depth imaging method, computer stereo vision imaging, or structured light method, etc., which are not specifically limited here.
  • the three-dimensional point cloud information is obtained by the method of computer stereo vision imaging, that is, feature extraction and matching are performed on the images of at least two targets obtained in step 401, and feature point pairs are obtained, according to the camera pose determined in step 402. And the feature point pair, based on the triangulation algorithm, obtain the three-dimensional point cloud corresponding to the pixel point in the image of the target.
  • the 3D point cloud information is obtained by the lidar depth imaging method. If the terminal includes a depth sensor, such as a laser sensor, etc., the 3D point cloud information can be directly obtained. Based on the specific configuration of the depth sensor, the output 3D point cloud information can be a dense 3D point cloud or a semi-dense 3D point cloud.
  • the 3D point cloud information can also be obtained by combining the above two methods, that is, when the 3D point cloud is calculated from the image of the target and the pose of the camera, the depth of the point cloud is directly provided by the depth map obtained by the depth sensor, which can improve the 3D point cloud.
  • the accuracy of the point cloud, in addition, the camera pose can also be optimized to make the camera pose more accurate.
  • the image of the target may include one or more objects whose height is to be measured, and face detection is performed on the image of the target to determine the face information of the one or more objects to be measured.
  • the terminal may also present the face detection result to the user, such as presenting the face information of each target or the number of voice output targets through the display screen.
  • face information of one or more objects to be measured can be determined.
  • the image of the target includes multiple face information
  • the image of the target can be segmented to obtain multiple image parts of the object to be measured, and the image parts of the multiple objects to be measured can be respectively used for the height measurement of the multiple objects to be measured .
  • steps 404 to 405 are optional steps, which may or may not be performed, which are not limited here.
  • the two-dimensional bone key point information of the image of the target is obtained, where the two-dimensional bone key point information includes the pixel coordinates of the bone key point and the identification of the bone key point corresponding to the pixel coordinates.
  • the bone detection algorithm can detect the key points of human bones.
  • the number of human bone key points can be, for example, 14 or 21.
  • Table 1 shows the meaning and number of human skeleton key points.
  • the pixel coordinates of each human skeleton key point in the image can be output through the bone detection algorithm, and identified by a preset number.
  • the two-dimensional bone key point information of each object to be measured can be acquired through a bone key point detection algorithm.
  • step 404 bone detection is performed on the image of the target, and the human skeleton key points of all objects to be measured in the image are obtained, and then the two-dimensional bone key point information corresponding to the face detection result of each object to be measured is determined. .
  • bone detection is performed on the images determined by the image segmentation in step 405, respectively, to obtain the two-dimensional bone key point information corresponding to each object to be measured.
  • the information of all objects to be measured is displayed to the user, and the information of the objects to be measured includes at least one of the following: image information of the object to be measured, two-dimensional information of the object to be measured. Skeletal key point information and face detection result information of the object to be measured; then obtain user instructions, and according to the user instructions, determine one or more of the at least two objects to be measured as the target for height measurement.
  • the two-dimensional bone key point information of the target is verified according to the face detection result.
  • the bone key point corresponding to the head in the two-dimensional bone key point information is usually a single node
  • the key point identified in the face detection is a single node.
  • the face information can indicate the information from the jaw to the hairline. Therefore, the pixel coordinates of the two-dimensional skeleton key points corresponding to the head can be verified through the face detection results, which can improve the accuracy of the height measurement results of this scheme.
  • the information of the detection failure is displayed to the user, or the information that the user is notified of the detection failure by voice, or the information that the user is notified of the detection failure by vibration, etc. , which is not specifically limited here.
  • the second preset condition may be that no skeleton key points are detected; or, the second preset condition may be that the number of skeleton key points is less than or equal to a preset threshold, such as 5 or 6 or 7, etc.; Alternatively, the second preset condition is that the number of bones indicated by the detected skeleton key points is less than or equal to a preset threshold, such as 3 or 4; or, the second preset condition is that the detected skeleton key points indicate the number of bones The type and quantity do not meet the preset requirements.
  • the bone type indicated by the bone key point does not include the bones corresponding to the big arm, the small arm, the thigh and the calf, or the bone type indicated by the bone key point does not include the head bone, or , the number of bones corresponding to the big arm, small arm, thigh and calf indicated by the bone key point is less than or equal to 3, and so on.
  • the specific content of the second preset condition is not limited here.
  • step 404 and step 406 are not limited.
  • steps 402 to 403 and steps 404 to 406 are not limited, and they can be executed simultaneously, or, first execute steps 402 to 403 and then execute steps 404 to 406, or, execute first Steps 404 to 406 are executed, and then steps 402 to 403 are executed.
  • the transformed 3D bone key point coordinates corresponding to the 2D bone key points are obtained according to the HitTest algorithm.
  • the three-dimensional bone key point information includes three-dimensional coordinates of the bone key point and an identifier of the bone key point corresponding to the three-dimensional coordinate.
  • FIG. 5 is a schematic diagram of converting two-dimensional bone key points into three-dimensional bone key points in the embodiment of the present application.
  • a virtual ray is emitted in the direction of the detected 2D bone key points, and the ray and the 3D point cloud are hit by collision detection (HitTest) to obtain the corresponding 2D bone key points.
  • the transformed 3D bone keypoint coordinates The specific method of impact detection is in the prior art, which will not be repeated here.
  • the final output is 3D bone key point information corresponding to 2D bone key points.
  • the pixel coordinates of the two-dimensional bone key points in the image are converted into three-dimensional coordinates in the world coordinate system, and the three-dimensional bone key point information includes bone key points.
  • the skeleton key is obtained by the collision detection algorithm.
  • the three-dimensional coordinates of the points are more accurate than the three-dimensional coordinates obtained by directly converting the two-dimensional coordinates of the skeleton key points through the pose of the camera. It can be understood that the denser the 3D point cloud, the more accurate the acquired 3D coordinates of the skeleton key points.
  • the bone length information is obtained according to the three-dimensional bone key point information, and the bone length information includes the identification of the bone and the length of the bone.
  • every two bone key points are connected to form one bone, and the true length of each bone is obtained through the Euclidean distance in the three-dimensional space between the 3D joint points.
  • the identification of the bone can be determined, and the identification of the bone is used to indicate the type of the bone.
  • the length of the left thigh bone can be obtained, according to The three-dimensional coordinates of the left knee node and the three-dimensional coordinates of the left ankle node can obtain the length of the left calf bone.
  • the bone length information obtained according to the three-dimensional bone key point information may only include the length information of one bone, or include the length information of multiple bones. There are no restrictions.
  • the bone length information satisfies the first preset condition
  • the bone length information is deleted.
  • the first preset condition is, for example, that the bone length exceeds the preset threshold range
  • the threshold range of the bone length of different types of bones is different, for example, the bone length range of the thigh bone. It is different from the bone length range of the forearm; in addition, based on the specific category of the measured target, such as adults, children or other vertebrates other than humans, the threshold range of the bone length of different types of measured targets can be flexibly set according to statistical information.
  • the first preset condition may also be that the difference in the length of the bones of the symmetrical part is greater than or equal to the preset threshold range, for example, the ratio of the length of the left arm bone to the right arm bone length is greater than or equal to 2, or less than or equal to 0.5, then delete the corresponding arm. bone length information.
  • the human body posture is estimated, and the posture information of the target is determined.
  • the posture information can be obtained by using the RMPE (regional multi-person pose estimation) algorithm or the instance segmentation (Mask RCNN) algorithm. This is not limited.
  • the posture information can be used to indicate the posture of the human body, and distinguish standing, sitting or lying postures, etc.;
  • the posture information is an incomplete posture, possibly because the target part of the torso is occluded in the image of the target, or some data in the bone length information is deleted, etc.
  • step 408 and step 409 is not limited.
  • a preset weight parameter is determined, and weighted calculation is performed according to the weight parameter and the bone length information to determine the height data of the target.
  • the height weighted calculation is performed according to formula (1):
  • n is the number of active bone
  • L i is the i-th length of the bone
  • ⁇ i is the i-th weighting coefficients of the length of the bone
  • ⁇ for the compensation parameters the weighting coefficients ⁇ i of the bones in different postures may be dynamically adjusted, or the weighting coefficients corresponding to the bones in different postures may also be pre-stored.
  • L f1 is the compensation value of the distance between the face and the top of the head.
  • the value range of L f1 is 2 cm to 3 cm
  • L f2 is the compensation value of the distance between the ankle node and the sole of the foot.
  • the value of L f2 is Values range from 3 cm to 5 cm.
  • L 1 is the bone length corresponding to the head
  • L n-1 is the bone length corresponding to the thigh
  • L n is the bone length corresponding to the calf
  • ⁇ 1 is the compensation factor for the distance between the face and the top of the head
  • ⁇ 2 is the distance between the ankle node and the sole of the foot. compensation factor.
  • the bone length information obtained through the three-dimensional bone key point information corresponds to the dotted line segment shown in Figure 6 or Figure 7, and the length of the dotted line segment represents the bone length information obtained by calculation.
  • the solid line segment is calculated and obtained.
  • the solution is calculated by the preset weighting coefficient, and the length of the solid line segment represents the actual height calculated by the weighting coefficient and the bone length.
  • La is the length corresponding to the head
  • La' is the actual height of the head obtained by weighted calculation
  • Lb is the length corresponding to the lower leg
  • Lb' is the actual height of the lower leg obtained by the weighted calculation.
  • each weighting coefficient can be adjusted according to the empirical value.
  • a neural network can also be used to train each weighting coefficient, and commonly used models include: decision tree, BP (back propagation) neural network, etc., which are not limited in this application.
  • the weighting coefficient of the bones can be adjusted according to the effective bone length information, and the height data can be calculated.
  • the obtained valid bone length information is incomplete, that is, the target pose information is incomplete, and there may be one or more valid bone length information. If there is only one valid bone length information, the weighting coefficient is determined for the bone; If there are multiple valid bone length information, a weighting coefficient is determined for each valid bone length information in the multiple valid bone length information, and the value of the weighting coefficient corresponding to each valid bone may be different, and the specific value is not limited here. It can be understood that the error of the height data calculated under the incomplete posture increases.
  • the user can be prompted that the current posture information is an incomplete posture, including screen display, voice prompts or vibration prompts, etc. There is no limitation here.
  • the terminal may output the height data to the user in various ways, including screen display, voice prompt or vibration prompt, etc., which are not limited here.
  • the measurement result is displayed near the image of the target on the screen in the form of tick marks. If multiple objects to be measured are measured at the same time, the height data of each object to be measured can be separately displayed near each object to be measured in the image of the target.
  • the SLAM system calculates and obtains the 3D point cloud corresponding to the measured object.
  • the distribution of the 3D point cloud is shown in Figure 9a.
  • the bone detection module performs 2D bone node detection (the bone detection algorithm detects 15 key bone nodes in the example), and the 2D detection result is shown in Figure 9b.
  • the coordinate conversion module converts 2D coordinates to 3D coordinates, and calculates the length of each 3D bone node.
  • the calculation of the length of each bone during actual operation is shown in Figure 9c.
  • Figure 9c shows the results of the two measurements respectively. It can be seen that when measuring from different distances and angles, the length of each bone measurement will fluctuate. At this time, the data integration module needs to be weighted, and the height is finally calculated. The calculation process is described below.
  • the length of each bone measured twice is shown in Table 2, and its unit is centimeter (cm):
  • the real height of the measured object in the example is 172cm.
  • the heights calculated by this method after the weighting of the two measurements are 175.7cm and 171.3cm, respectively, the error percentages are 2.15% and -0.42%, and the average measurement error is 1.28%.
  • FIG. 10 is a schematic diagram of an embodiment of the terminal in the embodiment of the present application.
  • the terminal in this embodiment of the present application may be various types of terminal devices, such as a mobile phone, a tablet, a notebook computer, or a wearable portable device, which is not specifically limited.
  • the terminal includes the following modules: an input module 1001 , a SLAM system 1002 , an automatic detection module 1003 , a coordinate conversion module 1004 , a data integration module 1005 and an output module 1006 .
  • the input module 1001 obtains real-time two-dimensional (2D) images and IMU data;
  • the SLAM system 1002 can perform pose estimation according to the 2D image and IMU data, and obtain the corresponding camera pose when the 2D image is taken.
  • the 2D image is processed by feature extraction, feature matching, and outlier elimination, and the features between the images are output. matching pairs.
  • the 3D point cloud generation module (corresponding to the triangulated map points in Figure 10) is based on the estimated camera pose and the feature matching pair between the images, and uses algorithms such as triangulation to calculate the corresponding 2D feature points.
  • Three-dimensional (3D) point Three-dimensional (3D) point.
  • the optimization module (corresponding to map point optimization and camera pose optimization in Figure 6) inputs the camera pose and 3D point cloud data, and jointly optimizes the camera pose and 3D point cloud.
  • the SLAM system 1002 outputs real-time camera pose and 3D point cloud data for use by other modules.
  • the specific algorithm of the SALM system may adopt any one in the prior art, which is not limited in this application.
  • the automatic detection module 1003 based on the real-time image data, detects the 2D key nodes (ie, 2D bone key points) of each target by using algorithms such as human body segmentation, bone detection, and face detection.
  • 2D key nodes ie, 2D bone key points
  • the coordinate conversion module 1004 converts the 2D key nodes into 3D key nodes (ie, 3D bone key points) according to the camera pose and the 3D point cloud data.
  • the data integration module 100 based on the 3D key node information, performs key node splicing, obtains the torso information of the measured object, inputs the 3D torso information into the posture detection module for posture detection, and the compensation module superimposes the corresponding compensation according to the detected different postures, and finally Get the measurement results of the user under test.
  • the output module 1006 outputs height information of a plurality of subjects.
  • FIG. 11a is a schematic diagram of another embodiment of the terminal in the embodiment of the present application.
  • the terminal includes:
  • an acquisition module 1101 configured to acquire an image including a target object and the pose of the camera when capturing the image
  • the acquiring module 1101 is further configured to acquire pixel coordinates of at least two skeleton key points of the target object in the image, where the pixel coordinates are used to represent the two-dimensional positions of the skeleton key points in the image information;
  • the acquisition module 1101 is further configured to acquire the three-dimensional coordinates of the bone key points according to the pose of the camera and the pixel coordinates of the bone key points, and the three-dimensional coordinates are used to indicate that the bone key points are in the world.
  • three-dimensional position information in the coordinate system, the three-dimensional coordinates of the at least two bone key points are used to represent the distance information between the at least two bone key points;
  • the determining module 1102 is configured to determine the height data of the target object according to the three-dimensional coordinates of the at least two skeleton key points.
  • the obtaining module 1101 is further configured to obtain the 3D point cloud information of the target object;
  • the obtaining of the three-dimensional coordinates of the skeleton key points of the target object according to the pose of the camera and the pixel coordinates of the skeleton key points specifically includes:
  • the 3D coordinates of the skeleton key points are acquired through an impact detection algorithm.
  • the obtaining module 1101 is specifically used for:
  • the three-dimensional point cloud information of the target object is acquired according to at least two images of the target object shot from different directions.
  • the obtaining module 1101 is specifically used for:
  • the depth sensor includes a binocular camera, a lidar, a millimeter-wave radar, or a time-of-flight sensor.
  • the obtaining module 1101 is specifically used for:
  • the pose of the camera is acquired according to the at least two images of the target object taken from different directions.
  • the obtaining module 1101 is specifically used for:
  • the at least two images of the target object taken from different orientations include images of the target object;
  • the pose of the camera is determined according to the at least two images of the target object taken from different orientations and the inertial measurement unit data.
  • the determining module 1102 is specifically configured to:
  • the skeleton key points obtain the skeleton length of the target object and the posture information of the target object;
  • the height data of the target object is determined.
  • the bone length includes the bone length of the head and the bone length of the leg;
  • the determining module 1102 is specifically used for:
  • the height data of the target object is determined.
  • the image includes at least two target objects
  • the device further includes: a processing module 1103, configured to perform face detection on the image, and determine the pixel coordinates of each of the at least two target objects from the pixel coordinates of the skeleton key points based on an image segmentation algorithm. Pixel coordinates of bone keypoints.
  • the device further includes:
  • the output module 1104 is configured to display the information of the at least two target objects to the user, where the information of the at least two target objects includes at least one of the following: image information of the at least two target objects, images marked with the at least two target objects The image information of the pixel coordinates of the skeleton key points of the two target objects and the face detection result information of the at least two target objects;
  • the acquiring module 1101 is further configured to acquire a user instruction, where the user instruction is used to instruct to perform height measurement on one or more of the at least two target objects.
  • the determining module 1102 is specifically configured to:
  • the first preset condition includes bone length information whose bone length does not belong to a preset range, or the bone length difference of the symmetrical part is greater than or equal to a preset threshold range;
  • the height data of the target object is determined according to the deleted bone length information.
  • the device further includes an output module 1104 for:
  • the device further includes an output module 1104 for:
  • the detection failure information is displayed to the user, or the voice prompts the user for the detection failure information, or the vibration prompts the user for the detection failure information.
  • the terminal provided in the embodiment of the present application can be used to detect the height, and through the acquisition module, the pixel coordinates of the key bone nodes of the target object in the image are obtained, and the three-dimensional coordinates of the key points of the bones in the three-dimensional space are obtained, and the determination module can be based on the at least The three-dimensional coordinates of the two skeleton key points determine the height data of the target object.
  • This device converts the two-dimensional pixel coordinates of the skeleton key points into three-dimensional coordinates, and directly obtains the height data of the target object without reference object conversion, which can avoid the measurement error caused by the reference object conversion when the scene around the target object is complex. It can improve the accuracy of height measurement results.
  • FIG. 11b is a schematic diagram of another embodiment of the terminal in the embodiment of the present application.
  • the terminal of the present application includes a sensor unit 1110 , a computing unit 1120 , a storage unit 1140 and an interaction unit 1130 .
  • the sensor unit 1110 usually includes a visual sensor (such as a camera), which is used to acquire 2D image information of the scene; an inertial sensor (IMU), which is used to acquire the motion information of the terminal, such as linear acceleration, angular velocity, etc.; Select), used to obtain the depth information of the scene;
  • a visual sensor such as a camera
  • IMU inertial sensor
  • Select used to obtain the depth information of the scene
  • the computing unit 1120 usually includes CPU, GPU, cache, registers, etc., and is mainly used to run the operating system and process various algorithm modules involved in this application, such as SLAM system, bone detection, face recognition, etc.;
  • the storage unit 1140 mainly includes memory and external storage, and is mainly used for reading and writing local and temporary data of users;
  • the interaction unit 1130 mainly includes a display screen, a touch panel, a speaker, a microphone, etc., and is mainly used for interacting with the user, obtaining input for input, and implementing the presentation algorithm effect and the like.
  • FIG. 12 is a schematic diagram of an embodiment of a terminal in an embodiment of the present application.
  • FIG. 13 is a schematic structural diagram of a terminal provided by an embodiment of the present application.
  • the terminal 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, Antenna 1, Antenna 2, Mobile Communication Module 150, Wireless Communication Module 160, Audio Module 170, Speaker 170A, Receiver 170B, Microphone 170C, Headphone Interface 170D, Sensor Module 180, Key 190, Motor 191, Indicator 192, Camera 193, Display screen 194, and subscriber identification module (subscriber identification module, SIM) card interface 195 and so on.
  • SIM subscriber identification module
  • the sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, and ambient light. Sensor 180L, bone conduction sensor 180M, etc.
  • the terminal 100 may include more or less components than shown, or some components may be combined, or some components may be separated, or different component arrangements.
  • the illustrated components may be implemented in hardware, software, or a combination of software and hardware.
  • the processor 110 may include one or more processing units, for example, the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), controller, memory, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural-network processing unit (NPU) Wait. Wherein, different processing units may be independent devices, or may be integrated in one or more processors.
  • application processor application processor, AP
  • modem processor graphics processor
  • graphics processor graphics processor
  • ISP image signal processor
  • controller memory
  • video codec digital signal processor
  • DSP digital signal processor
  • NPU neural-network processing unit
  • the controller may be the nerve center and command center of the terminal 100 .
  • the controller can generate an operation control signal according to the instruction operation code and timing signal, and complete the control of fetching and executing instructions.
  • a memory may also be provided in the processor 110 for storing instructions and data.
  • the memory in processor 110 is cache memory. This memory may hold instructions or data that have just been used or recycled by the processor 110 . If the processor 110 needs to use the instruction or data again, it can be called directly from the memory. Repeated accesses are avoided and the latency of the processor 110 is reduced, thereby increasing the efficiency of the system.
  • the processor 110 may include one or more interfaces.
  • the interface may include an integrated circuit (inter-integrated circuit, I1C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I1S) interface, a pulse code modulation (pulse code modulation, PCM) interface, a universal asynchronous transceiver (universal asynchronous transmitter) receiver/transmitter, UART) interface, mobile industry processor interface (MIPI), general-purpose input/output (GPIO) interface, subscriber identity module (SIM) interface, and / or universal serial bus (universal serial bus, USB) interface, etc.
  • I1C integrated circuit
  • I1S integrated circuit built-in audio
  • PCM pulse code modulation
  • PCM pulse code modulation
  • UART universal asynchronous transceiver
  • MIPI mobile industry processor interface
  • GPIO general-purpose input/output
  • SIM subscriber identity module
  • USB universal serial bus
  • the interface connection relationship between the modules illustrated in the embodiments of the present application is only a schematic illustration, and does not constitute a structural limitation of the terminal 100 .
  • the terminal 100 may also adopt different interface connection manners in the foregoing embodiments, or a combination of multiple interface connection manners.
  • the charging management module 140 is used to receive charging input from the charger.
  • the charger may be a wireless charger or a wired charger.
  • the charging management module 140 may receive charging input from the wired charger through the USB interface 130 .
  • the power management module 141 is used for connecting the battery 142 , the charging management module 140 and the processor 110 .
  • the power management module 141 receives input from the battery 142 and/or the charge management module 140, and supplies power to the processor 110, the internal memory 121, the external memory, the display screen 194, the camera 193, and the wireless communication module 160.
  • the wireless communication function of the terminal 100 may be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modulation and demodulation processor, the baseband processor, and the like.
  • the terminal 100 may communicate with other devices using a wireless communication function.
  • the terminal 100 may communicate with the second electronic device, the terminal 100 establishes a screen projection connection with the second electronic device, and the terminal 100 outputs the screen projection data to the second electronic device.
  • the screen projection data output by the terminal 100 may be audio and video data.
  • Antenna 1 and Antenna 2 are used to transmit and receive electromagnetic wave signals.
  • Each antenna in terminal 100 may be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve antenna utilization.
  • the antenna 1 can be multiplexed as a diversity antenna of the wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.
  • the mobile communication module 150 may provide a wireless communication solution including 1G/3G/4G/5G, etc. applied on the terminal 100 .
  • the mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (LNA) and the like.
  • the mobile communication module 150 can receive electromagnetic waves from the antenna 1, filter and amplify the received electromagnetic waves, and transmit them to the modulation and demodulation processor for demodulation.
  • the mobile communication module 150 can also amplify the signal modulated by the modulation and demodulation processor, and then convert it into electromagnetic waves and radiate it out through the antenna 2 .
  • at least part of the functional modules of the mobile communication module 150 may be provided in the processor 110 .
  • at least part of the functional modules of the mobile communication module 150 may be provided in the same device as at least part of the modules of the processor 110 .
  • the modem processor may include a modulator and a demodulator.
  • the modulator is used to modulate the low frequency baseband signal to be sent into a medium and high frequency signal.
  • the demodulator is used to demodulate the received electromagnetic wave signal into a low frequency baseband signal. Then the demodulator transmits the demodulated low-frequency baseband signal to the baseband processor for processing.
  • the low frequency baseband signal is processed by the baseband processor and passed to the application processor.
  • the application processor outputs sound signals through audio devices (not limited to the speaker 170A, the receiver 170B, etc.), or displays images or videos through the display screen 194 .
  • the modem processor may be a stand-alone device.
  • the modem processor may be independent of the processor 110, and may be provided in the same device as the mobile communication module 150 or other functional modules.
  • the wireless communication module 160 can provide applications on the terminal 100 including wireless local area networks (WLAN) (such as wireless fidelity (Wi-Fi) network), bluetooth (BT), global navigation satellite system (global navigation satellite system, GNSS), frequency modulation (frequency modulation, FM), near field communication technology (near field communication, NFC), infrared technology (infrared, IR) and other wireless communication solutions.
  • WLAN wireless local area networks
  • BT wireless fidelity
  • GNSS global navigation satellite system
  • frequency modulation frequency modulation, FM
  • NFC near field communication technology
  • infrared technology infrared, IR
  • the wireless communication module 160 may be one or more devices integrating at least one communication processing module.
  • the wireless communication module 160 receives electromagnetic waves via the antenna 1 , modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 110 .
  • the wireless communication module 160 can also receive the signal to be sent from the processor 110 , perform frequency modulation on it, amplify it, and convert it into electromagnetic waves for radiation
  • the antenna 1 of the terminal 100 is coupled with the mobile communication module 150, and the antenna 2 is coupled with the wireless communication module 160, so that the terminal 100 can communicate with the network and other devices through wireless communication technology.
  • the wireless communication technologies may include global system for mobile communications (GSM), general packet radio service (GPRS), code division multiple access (CDMA), broadband Code Division Multiple Access (WCDMA), Time Division Code Division Multiple Access (TD-SCDMA), Long Term Evolution (LTE), BT, GNSS, WLAN, NFC , FM, and/or IR technology, etc.
  • the GNSS may include a global positioning system (global positioning system, GPS), a global navigation satellite system (GLONASS), a Beidou navigation satellite system (BDS), a quasi-zenith satellite system (quasi -zenith satellite system, QZSS) and/or satellite based augmentation systems (SBAS).
  • GPS global positioning system
  • GLONASS global navigation satellite system
  • BDS Beidou navigation satellite system
  • QZSS quasi-zenith satellite system
  • SBAS satellite based augmentation systems
  • the terminal 100 implements a display function through a GPU, a display screen 194, an application processor, and the like.
  • the GPU is a microprocessor for image processing, and is connected to the display screen 194 and the application processor.
  • the GPU is used to perform mathematical and geometric calculations for graphics rendering.
  • Processor 110 may include one or more GPUs that execute program instructions to generate or alter display information.
  • Display screen 194 is used to display images, videos, and the like.
  • Display screen 194 includes a display panel.
  • the display panel can be a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode or an active-matrix organic light-emitting diode (active-matrix organic light).
  • LED diode AMOLED
  • flexible light-emitting diode flexible light-emitting diode (flex light-emitting diode, FLED), Miniled, MicroLed, Micro-oLed, quantum dot light-emitting diode (quantum dot light emitting diodes, QLED) and so on.
  • the terminal 100 may include one or N display screens 194 , where N is a positive integer greater than one.
  • the display screen 194 may be used to display various interfaces output by the system of the terminal 100 .
  • interfaces output by the terminal 100 For each interface output by the terminal 100, reference may be made to related descriptions in subsequent embodiments.
  • the terminal 100 can realize the shooting function through the ISP, the camera 193, the video codec, the GPU, the display screen 194 and the application processor.
  • the ISP is used to process the data fed back by the camera 193 .
  • the shutter is opened, the light is transmitted to the camera photosensitive element through the lens, the light signal is converted into an electrical signal, and the camera photosensitive element transmits the electrical signal to the ISP for processing, and converts it into an image visible to the naked eye.
  • ISP can also perform algorithm optimization on image noise, brightness, and skin tone.
  • ISP can also optimize the exposure, color temperature and other parameters of the shooting scene.
  • the ISP may be provided in the camera 193 .
  • Camera 193 is used to capture still images or video.
  • the object is projected through the lens to generate an optical image onto the photosensitive element.
  • the photosensitive element may be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor.
  • CMOS complementary metal-oxide-semiconductor
  • the photosensitive element converts the optical signal into an electrical signal, and then transmits the electrical signal to the ISP to convert it into a digital image signal.
  • the ISP outputs the digital image signal to the DSP for processing.
  • DSP converts digital image signals into standard RGB, YUV and other formats of image signals.
  • the terminal 100 may include 1 or N cameras 193 , where N is a positive integer greater than 1.
  • a digital signal processor is used to process digital signals, in addition to processing digital image signals, it can also process other digital signals.
  • Video codecs are used to compress or decompress digital video.
  • Terminal 100 may support one or more video codecs.
  • the terminal 100 can play or record videos in various encoding formats, such as: moving picture experts group (moving picture experts group, MPEG) 1, MPEG1, MPEG3, MPEG4, and so on.
  • MPEG moving picture experts group
  • the NPU is a neural-network (NN) computing processor.
  • NN neural-network
  • Applications such as intelligent cognition of the terminal 100 can be implemented through the NPU, such as image recognition, face recognition, speech recognition, text understanding, and the like.
  • the external memory interface 120 can be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the terminal 100.
  • the external memory card communicates with the processor 110 through the external memory interface 120 to realize the data storage function. For example, save music, video, etc. files in an external memory card.
  • Internal memory 121 may be used to store computer executable program code, which includes instructions.
  • the processor 110 executes various functional applications and data processing of the terminal 100 by executing the instructions stored in the internal memory 121 .
  • the internal memory 121 may include a storage program area and a storage data area.
  • the storage program area can store an operating system, an application program required for at least one function (such as a sound playback function, an image playback function, etc.), and the like.
  • the storage data area may store data (such as audio data, phone book, etc.) created during the use of the terminal 100 and the like.
  • the internal memory 121 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, universal flash storage (UFS), and the like.
  • the terminal 100 may implement audio functions through an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, an application processor, and the like. Such as music playback, recording, etc.
  • the audio module 170 can be used to play the sound corresponding to the video. For example, when the display screen 194 displays a video playing screen, the audio module 170 outputs the sound of the video playing.
  • the audio module 170 is used for converting digital audio information into analog audio signal output, and also for converting analog audio input into digital audio signal.
  • Speaker 170A also referred to as a “speaker” is used to convert audio electrical signals into sound signals.
  • the receiver 170B also referred to as “earpiece”, is used to convert audio electrical signals into sound signals.
  • the microphone 170C also called “microphone” or “microphone”, is used to convert sound signals into electrical signals.
  • the earphone jack 170D is used to connect wired earphones.
  • the earphone interface 170D can be the USB interface 130, or can be a 3.5mm open mobile terminal platform (OMTP) standard interface, a cellular telecommunications industry association of the USA (CTIA) standard interface.
  • OMTP open mobile terminal platform
  • CTIA cellular telecommunications industry association of the USA
  • the pressure sensor 180A is used to sense pressure signals, and can convert the pressure signals into electrical signals.
  • the pressure sensor 180A may be provided on the display screen 194 .
  • the gyro sensor 180B may be used to determine the motion attitude of the terminal 100 .
  • the air pressure sensor 180C is used to measure air pressure.
  • the acceleration sensor 180E can detect the magnitude of the acceleration of the terminal 100 in various directions (including three axes or six axes). When the terminal 100 is stationary, the magnitude and direction of gravity can be detected. It can also be used to identify the terminal posture, and can be used in horizontal and vertical screen switching, pedometer and other applications.
  • Distance sensor 180F for measuring distance.
  • the ambient light sensor 180L is used to sense ambient light brightness.
  • the fingerprint sensor 180H is used to collect fingerprints.
  • the temperature sensor 180J is used to detect the temperature.
  • Touch sensor 180K also called “touch panel”.
  • the touch sensor 180K may be disposed on the display screen 194 , and the touch sensor 180K and the display screen 194 form a touch screen, also called a “touch screen”.
  • the touch sensor 180K is used to detect a touch operation on or near it.
  • the touch sensor can pass the detected touch operation to the application processor to determine the type of touch event.
  • Visual output related to touch operations may be provided through display screen 194 .
  • the touch sensor 180K may also be disposed on the surface of the terminal 100 , which is different from the position where the display screen 194 is located.
  • the keys 190 include a power-on key, a volume key, and the like. Keys 190 may be mechanical keys. It can also be a touch key.
  • the terminal 100 may receive key input and generate key signal input related to user settings and function control of the terminal 100 .
  • Motor 191 can generate vibrating cues.
  • the indicator 192 can be an indicator light, which can be used to indicate the charging state, the change of the power, and can also be used to indicate a message, a missed call, a notification, and the like.
  • the SIM card interface 195 is used to connect a SIM card.
  • the disclosed system, apparatus and method may be implemented in other manners.
  • the apparatus embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented.
  • the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units may be implemented in the form of hardware, or may be implemented in the form of software functional units.
  • the integrated unit if implemented in the form of a software functional unit and sold or used as an independent product, may be stored in a computer-readable storage medium.
  • the technical solutions of the present application can be embodied in the form of software products in essence, or the parts that contribute to the prior art, or all or part of the technical solutions, and the computer software products are stored in a storage medium , including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk and other media that can store program codes .

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Medical Informatics (AREA)
  • Multimedia (AREA)
  • Geometry (AREA)
  • Surgery (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Pathology (AREA)
  • Animal Behavior & Ethology (AREA)
  • Biophysics (AREA)
  • Public Health (AREA)
  • Veterinary Medicine (AREA)
  • Dentistry (AREA)
  • Heart & Thoracic Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computer Graphics (AREA)
  • Length Measuring Devices By Optical Means (AREA)
  • Image Analysis (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Engineering & Computer Science (AREA)

Abstract

一种身高测量方法,涉及图像处理技术领域。所述方法包括:获取包括目标对象的图像和拍摄所述图像时的相机的位姿(301);获取所述图像中所述目标对象的至少两个骨骼关键点的像素坐标(302);根据所述相机的位姿和所述骨骼关键点的像素坐标,获取所述骨骼关键点的三维坐标(303);根据所述至少两个骨骼关键点的三维坐标确定所述目标对象的身高数据(304)。该方法可以快速、准确地获取被测对象的身高数据。

Description

身高测量方法、身高测量装置和终端 技术领域
本申请涉及图像处理技术领域,尤其涉及一种身高测量方法、身高测量装置和终端。
背景技术
身高是人体基本数据的重要组成部分,一直以来都备受关注。如何快速、准确的获得被测对象的身高数据,如何同时获取多个被测对象的身高数据,一直以来都是相关领域探索热点。
传统的身高测量方法,要求测量对象处于站姿,借助标准的刻度尺,或者利用红外线或超声波的反射来获取身高数据,只能逐个测量,且对姿态要求严格,若站姿不标准则身高数据不准确。
现有的一种身高测量方法,可以通过参照物进行等比例缩放来获取测量对象的身高,例如,请参阅图1,在终端预览界面显示虚拟标尺,通过终端的距离传感器获取终端设备与被测对象之间的距离。根据预设的距离值与比例尺之间的对应关系,估计被测对象的高度。
由于该身高测量方法的测量结果是通过终端与测量对象之间距离和虚拟标尺等比例放大得到,终端设备的分辨率以及终端设备距离传感器的准确度都会影响测量的准确度,在分辨率不足或者测量对象周围环境杂乱时,身高测量结果准确度低。
发明内容
本申请实施例提供了一种身高测量方法,用于测量目标对象的身高可以提高测量结果的准确度。
本申请实施例的第一方面提供了一种身高测量方法,包括:获取包括目标对象的图像和拍摄所述图像时的相机的位姿;获取所述图像中所述目标对象的至少两个骨骼关键点的像素坐标,所述骨骼关键点包括骨骼关节点,所述像素坐标用于表示所述骨骼关键点在所述图像中的二维位置信息;根据所述相机的位姿和所述骨骼关键点的像素坐标,获取所述至少两个骨骼关键点的三维坐标,所述三维坐标用于表示所述骨骼关键点在坐标系中的三维位置信息,所述至少两个骨骼关键点的三维坐标用于表示所述至少两个骨骼关键点之间的距离信息;根据所述至少两个骨骼关键点的三维坐标确定所述目标对象的身高数据。
本申请实施例提供的身高测量方法,可以通过骨骼检测算法等方式检测拍摄目标对象得到的二维图像,可以获取图像中骨骼关键点的像素坐标,基于二维图像对应的相机位姿,可以将骨骼关键点的像素坐标转换为三维空间中的三维坐标,三维坐标对应于骨骼关键点在现实世界中的位置信息,由此可以直接获取目标对象的身高数据。该方案提供的身高测量方法,通过拍摄的目标对象的二维图像可以无接触地获取目标对象的身高数据,此外,拍摄场景中也不需要存在高度参照物,可以减少误差,提高测量准确度。
在第一方面的一种可能的实现方式中,根据所述至少两个骨骼关键点的三维坐标确定所述目标对象的身高数据,具体包括:获取所述图像中所述目标对象的至少三个骨骼关键点的像素坐标;根据所述相机的位姿和所述至少三个骨骼关键点的像素坐标,获取所述至少三个骨骼关键点的三维坐标,所述三维坐标用于表示所述骨骼关键点在坐标系中的三维位置信息,所述至少三个骨骼关键点的三维坐标用于表示所述至少三个骨骼关键点之间的距离信息;根据所述至少三个骨骼关键点的三维坐标确定至少两段骨骼距离,根据所述至少两段骨骼距离 确定所述目标对象的身高数据。
在第一方面的一种可能的实现方式中,坐标系包括世界坐标系。
在第一方面的一种可能的实现方式中,所述方法还包括:获取所述目标对象的三维点云信息;所述根据所述相机的位姿和所述骨骼关键点的像素坐标,获取所述目标对象的至少两个骨骼关键点的三维坐标具体包括:根据所述骨骼关键点的像素坐标、所述相机的位姿和所述三维点云信息,通过撞击检测算法获取至少两个骨骼关键点的三维坐标。
本方法提供了将骨骼关键点的像素坐标转换为骨骼关键点的三维坐标,即二维信息至三维信息的一种具体转换方案,基于三维点云信息和撞击检测算法获取骨骼关键点的三维坐标,相较通过相机的位姿直接进行计算,可以提高三维坐标的准确度。
在第一方面的一种可能的实现方式中,所述获取所述目标对象的三维点云信息具体包括:根据至少两张从不同方位拍摄目标对象的图像,获取所述目标对象的三维点云信息。
本方法提供了获取三维点云信息的一种具体方法,即通过多张目标对象的图像获取,基于目标对象的多种图像的特征点检测及匹配,可以获取目标对象的三维点云信息,由于三维点云信息基于多张图像的信息获取,相较单张图像包含的信息更丰富,可以提高三维坐标的准确度。
在第一方面的一种可能的实现方式中,所述获取所述目标对象的三维点云信息具体包括:获取深度传感器采集的所述目标对象的三维点云信息,所述深度传感器包括双目摄像头、激光雷达、毫米波雷达或飞行时间(time of flight,TOF)传感器。
本方法提供了获取三维点云信息的另一种具体方法,通过深度传感器采集三维点云信息,由于通过深度传感器获取的三维点云可以为稠密点云,能够包含更丰富的信息,基于稠密的三维点云,获取的骨骼关键点的三维坐标准确度更高。
在第一方面的一种可能的实现方式中,所述获取目标对象的图像和拍摄所述图像时的相机的位姿具体包括:获取至少两张从不同方位拍摄目标对象的图像,所述至少两张从不同方位拍摄目标对象的图像包括所述图像;根据所述至少两张从不同方位拍摄目标对象的图像,获取所述相机的位姿。
本方法提供了获取相机的位姿的一种具体方式,即获取至少两张从不同方位拍摄目标对象的图像,通过特征点检测和特征点匹配,可以估计拍摄该图像时的相机的位姿。
在第一方面的一种可能的实现方式中,所述获取目标对象的图像和拍摄所述图像时的相机的位姿具体包括:获取至少两张从不同方位拍摄目标对象的图像,所述至少两张从不同方位拍摄目标对象的图像包括所述目标对象的图像;获取所述至少两张从不同方位拍摄目标对象的图像对应的相机的惯性测量单元数据;根据所述至少两张从不同方位拍摄目标对象的图像和所述惯性测量单元数据确定所述相机的位姿。
本方法提供了获取相机的位姿的一种具体方式,除了获取至少两张从不同方位拍摄目标对象的图像,还可以采集惯性测量单元数据,可以提高相机的位姿的计算准确度。
在第一方面的一种可能的实现方式中,所述根据所述至少两个骨骼关键点的三维坐标确定所述目标对象的身高数据具体包括:根据所述至少两个骨骼关键点的三维坐标,获取所述目标对象的骨骼长度和所述目标对象的姿态信息;根据所述姿态信息,确定预设的骨骼长度的权重参数;根据所述骨骼长度和所述权重参数,确定所述目标对象的身高数据。
本方法提供的身高测量方法,考虑到骨骼关键点的三维坐标来源于目标对象身体表面,由三维坐标获取的骨骼长度与骨骼对应在身体中的实际高度存在一定的误差,因此,引入权重参数对计算得到的骨骼长度进行校正,可以提高方案的精度。
在第一方面的一种可能的实现方式中,所述骨骼长度包括头部的骨骼长度和腿部的骨骼长度;所述根据所述骨骼长度和所述权重参数,确定所述目标对象的身高数据具体包括:根据所述头部的骨骼长度和预设的头部补偿参数,确定头部高度补偿值;根据所述腿部的骨骼长度和预设的脚部补偿参数,确定脚部高度补偿值;根据所述骨骼长度信息、所述权重参数、所述头部高度补偿值和所述脚部高度补偿值,确定所述目标对象的身高数据。
本方法提供的身高测量方法,引入头部和脚部的补偿,可以进一步提高身高测量的精度。
在第一方面的一种可能的实现方式中,所述方法还包括:对所述图像进行人脸检测,获取所述目标对象的头部高度数据,所述头部高度数据用于对所述二维骨骼关键点信息中头部对应的骨骼关键点的像素坐标进行校正。
本方法提供的身高测量方法,还可以通过人脸检测获取头部高度数据,对骨骼关键点的像素坐标进行校正,提高测量的准确度。
在第一方面的一种可能的实现方式中,所述图像中包括至少两个目标对象;所述方法还包括:对所述图像进行人脸检测,并基于图像分割算法从所述骨骼关键点的像素坐标中确定所述至少两个目标对象中每个目标对象的骨骼关键点的像素坐标。
本方法提供的身高测量方法,可以对图像中的多个目标对象进行身高测量,相较现有技术中逐个进行身高检测,可以简化操作,提高测量效率。
在第一方面的一种可能的实现方式中,所述方法还包括:向用户显示所述至少两个目标对象的信息,所述至少两个目标对象的信息包括以下至少一种:所述至少两个目标对象的图像信息、标记有所述至少两个目标对象的骨骼关键点的像素坐标的图像信息和所述至少两个目标对象的人脸检测结果信息;获取用户指令,所述用户指令用于指示对所述至少两个目标对象中的一个或多个进行身高测量。
本方法提供的身高测量方法,还可以与用户进行交互,根据用户指令,从图像包括的目标对象中选取用户想要检测身高的对象,提升用户体验。
在第一方面的一种可能的实现方式中,所述骨骼关键点沿重力方向排布,按照重力方向排布的骨骼关键点有助于提升身高测量的准确度。
在第一方面的一种可能的实现方式中,所述目标对象为非站立姿态,非站立姿态包括坐姿,卧姿,跪姿。目标对象为非站立姿态时,本申请的实现方式也能够测量目标对象的身高。
在第一方面的一种可能的实现方式中,所述根据所述至少两个骨骼关键点的三维坐标确定所述目标对象的身高数据具体包括:根据所述至少两个骨骼关键点的三维坐标,获取目标对象的骨骼长度信息;删减满足第一预设条件的骨骼长度信息,所述第一预设条件包括骨骼长度不属于预设范围的骨骼长度信息,或对称部分的骨骼长度差异大于或等于预设阈值范围;根据删减后的骨骼长度信息确定所述目标对象的身高数据。
本方法提供的身高测量方法,还可以对异常数据进行删减,提高测量结果的准确度。可选地,基于人体的对称性,对左右对称部位的骨骼可以进行校验,例如,左腿与右腿对应的骨骼长度的差异应较小,若差异大于阈值可以进行异常数据删减。
在第一方面的一种可能的实现方式中,所述方法还包括:将所述目标对象的身高数据标注在所述图像中的所述目标对象附近并向用户显示;或者,语音播报所述目标对象的身高数据。
本方法提供的身高测量方法,可以在实时显示的图像中标注目标对象的身高,即时反馈,提升用户体验。
在第一方面的一种可能的实现方式中,所述方法还包括:若所述目标对象的骨骼关键点不满足第二预设条件,则向用户显示检测失败的信息,或语音提示用户检测失败的信息,或震动提示用户检测失败的信息。
本方法提供的身高测量方法,在检测失败时可以向用户反馈,提升用户体验。
本申请实施例的第二方面提供了一种身高测量装置,包括:获取模块,用于获取包括目标对象的图像和拍摄所述图像时的相机的位姿;所述获取模块,还用于获取所述图像中所述目标对象的至少两个骨骼关键点的像素坐标,所述骨骼关键点包括骨骼关节点,所述像素坐标用于表示所述骨骼关键点在所述图像中的二维位置信息;所述获取模块,还用于根据所述相机的位姿和所述骨骼关键点的像素坐标,获取所述至少两个骨骼关键点的三维坐标,所述三维坐标用于表示所述骨骼关键点在坐标系中的三维位置信息,所述至少两个骨骼关键点的三维坐标用于表示所述至少两个骨骼关键点之间的距离信息;确定模块,用于根据所述至少两个骨骼关键点的三维坐标确定所述目标对象的身高数据。
在第二方面的一种可能的实现方式中,所述确定模块,具体用于:获取所述图像中所述目标对象的至少三个骨骼关键点的像素坐标;根据所述相机的位姿和所述至少三个骨骼关键点的像素坐标,获取所述至少三个骨骼关键点的三维坐标,所述三维坐标用于表示所述骨骼关键点在坐标系中的三维位置信息,所述至少三个骨骼关键点的三维坐标用于表示所述至少三个骨骼关键点之间的距离信息;
所述确定模块,具体用于:根据所述至少三个骨骼关键点的三维坐标确定至少两段骨骼距离,根据所述至少两段骨骼距离确定所述目标对象的身高数据。
在第二方面的一种可能的实现方式中,坐标系包括世界坐标系。
在第二方面的一种可能的实现方式中,所述获取模块,还用于获取所述目标对象的三维点云信息;所述根据所述相机的位姿和所述骨骼关键点的像素坐标,获取所述目标对象的至少两个骨骼关键点的三维坐标具体包括:根据所述骨骼关键点的像素坐标、所述相机的位姿和所述三维点云信息,通过撞击检测算法获取至少两个骨骼关键点的三维坐标。
在第二方面的一种可能的实现方式中,所述获取模块具体用于:根据至少两张从不同方位拍摄目标对象的图像,获取所述目标对象的三维点云信息。
在第二方面的一种可能的实现方式中,所述获取模块具体用于:获取深度传感器采集的所述目标对象的三维点云信息,所述深度传感器包括双目摄像头、激光雷达、毫米波雷达或飞行时间传感器。
在第二方面的一种可能的实现方式中,所述获取模块具体用于:获取至少两张从不同方位拍摄目标对象的图像,所述至少两张从不同方位拍摄目标对象的图像包括所述图像;根据所述至少两张从不同方位拍摄目标对象的图像,获取所述相机的位姿。
在第二方面的一种可能的实现方式中,所述获取模块具体用于:获取至少两张从不同方位拍摄目标对象的图像,所述至少两张从不同方位拍摄目标对象的图像包括所述目标对象的图像;获取所述至少两张从不同方位拍摄目标对象的图像对应的相机的惯性测量单元数据;根据所述至少两张从不同方位拍摄目标对象的图像和所述惯性测量单元数据确定所述相机的位姿。
在第二方面的一种可能的实现方式中,所述确定模块具体用于:根据所述至少两个骨骼关键点的三维坐标,获取所述目标对象的骨骼长度和所述目标对象的姿态信息;根据所述姿态信息,确定预设的骨骼长度的权重参数;根据所述骨骼长度和所述权重参数,确定所述目标对象的身高数据。
在第二方面的一种可能的实现方式中,所述骨骼长度包括头部的骨骼长度和腿部的骨骼长度;所述确定模块具体用于:根据所述头部的骨骼长度和预设的头部补偿参数,确定头部高度补偿值;根据所述腿部的骨骼长度和预设的脚部补偿参数,确定脚部高度补偿值;根据所述骨骼长度信息、所述权重参数、所述头部高度补偿值和所述脚部高度补偿值,确定所述目标对象的身高数据。
在第二方面的一种可能的实现方式中,所述图像中包括至少两个目标对象;所述设备还包括:处理模块,用于对所述图像进行人脸检测,并基于图像分割算法从所述骨骼关键点的像素坐标中确定所述至少两个目标对象中每个目标对象的骨骼关键点的像素坐标。
在第二方面的一种可能的实现方式中,所述设备还包括:输出模块,用于向用户显示所述至少两个目标对象的信息,所述至少两个目标对象的信息包括以下至少一种:所述至少两个目标对象的图像信息、标记有所述至少两个目标对象的骨骼关键点的像素坐标的图像信息和所述至少两个目标对象的人脸检测结果信息;所述获取模块还用于获取用户指令,所述用户指令用于指示对所述至少两个目标对象中的一个或多个进行身高测量。
在第二方面的一种可能的实现方式中,所述骨骼关键点沿重力方向排布,按照重力方向排布的骨骼关键点有助于提升身高测量的准确度。
在第二方面的一种可能的实现方式中,所述目标对象为非站立姿态,非站立姿态包括坐姿,卧姿,跪姿。目标对象为非站立姿态时,本申请的实现方式也能够测量目标对象的身高。
在第二方面的一种可能的实现方式中,所述确定模块具体用于:根据所述至少两个骨骼关键点的三维坐标,获取目标对象的骨骼长度信息;删减满足第一预设条件的骨骼长度信息,所述第一预设条件包括骨骼长度不属于预设范围的骨骼长度信息,或对称部分的骨骼长度差异大于或等于预设阈值范围;根据删减后的骨骼长度信息确定所述目标对象的身高数据。
在第二方面的一种可能的实现方式中,所述设备还包括输出模块,用于:将所述目标对象的身高数据标注在所述图像中的所述目标对象附近并向用户显示;或者,语音播报所述目标对象的身高数据。
在第二方面的一种可能的实现方式中,所述设备还包括输出模块,用于:若所述目标对象的骨骼关键点不满足第二预设条件,则向用户显示检测失败的信息,或语音提示用户检测失败的信息,或震动提示用户检测失败的信息。
本申请实施例第三方面提供了一种终端,包括:一个或多个处理器和存储器;其中,所述存储器中存储有计算机可读指令;所述一个或多个处理器读取所述存储器中的所述计算机可读指令以使所述终端实现如上述第一方面以及各种可能的实现方式中任一项所述的方法。
本申请实施例第四方面提供了一种包含指令的计算机程序产品,其特征在于,当其在计算机上运行时,使得所述计算机执行如上述第一方面以及各种可能的实现方式中任一项所述的方法。
本申请实施例第五方面提供了一种计算机可读存储介质,包括指令,其特征在于,当所述指令在计算机上运行时,使得计算机执行如上述第一方面以及各种可能的实现方式中任一项所述的方法。
本申请实施例第六方面提供了一种芯片,包括处理器。处理器用于读取并执行存储器中存储的计算机程序,以执行上述任一方面任意可能的实现方式中的方法。可选地,该芯片该包括存储器,该存储器与该处理器通过电路或电线与存储器连接。进一步可选地,该芯片还包括通信接口,处理器与该通信接口连接。通信接口用于接收需要处理的数据和/或信息,处理器从该通信接口获取该数据和/或信息,并对该数据和/或信息进行处理,并通过该通信接口 输出处理结果。该通信接口可以是输入输出接口。
其中,第二方面至第六方面中任一种实现方式所带来的技术效果可参见第一方面中相应实现方式所带来的技术效果,此处不再赘述。
从以上技术方案可以看出,本申请实施例具有以下优点:
本申请实施例提供的身高测量方法,获取目标对象的图像和拍摄所述图像时的相机的位姿,可以对图像进行骨骼检测,获取图像目标对象的至少两个骨骼关键点的像素坐标,然后根据相机位姿将骨骼关键点像素坐标转换到三维空间,得到至少两个骨骼关键点的三维坐标,最后根据所述至少两个骨骼关键点的三维坐标确定所述目标对象的身高数据。本方法通过将骨骼关键点的二维的像素坐标转换为三维坐标,不经参照物转换直接获取目标对象的身高数据,可以避免目标对象周围场景复杂时,通过参照物转换带来的测量误差,可以提高身高测量结果准确度。
此外,由于无论目标对象处于何种姿态,目标对象的骨骼关键点指示的骨骼信息不会改变,因此,本申请实施例提供的身高测量方法,可以适用目标对象在各种姿态下进行身高测量。
附图说明
图1为身高测量的一个实施例示意图;
图2a为本申请实施例中身高测量方法的应用场景的一个实施例示意图;
图2b为本申请实施例中身高测量方法的应用场景的一个实施例示意图;
图3为本申请实施例中身高测量方法的一个实施例示意图;
图4为本申请实施例中身高测量方法的另一个实施例示意图;
图5为本申请实施例中二维骨骼关键点转换为三维骨骼关键点的一个示意图;
图6为本申请实施例中站姿下的身高测量方法的示意图;
图7为本申请实施例中坐姿下的身高测量方法的示意图;
图8为本申请实施例中身高测量方法的应用场景的一个示意图;
图9a为本申请实施例中SLAM系统点云的一个示意图;
图9b为本申请实施例中二维骨骼关键点检测结果的一个示意图;
图9c为本申请实施例中不同角度测量时的身高检测示意图;
图10为本申请实施例中身高测量装置的一个实施例示意图;
图11a为本申请实施例中身高测量装置的另一个实施例示意图;
图11b为本申请实施例中身高测量装置的另一个实施例示意图;
图12为本申请实施例中终端的一个实施例示意图。
具体实施方式
本申请实施例提供了一种身高测量方法,用于多种姿态下目标对象的身高测量,可以提升身高数据的准确度。
为了便于理解,下面对本申请实施例涉及的部分技术术语进行简要介绍:
1、人体骨骼关键点检测:即Pose Estimation,主要检测人体的一些关键点,如关节,五官等,通过关键点描述人体骨骼信息。骨骼关键点也称为骨骼节点或关节点。
2、相机内参数和相机外参数:
相机内参数是与相机自身特性相关的参数,包括相机的焦距、像素大小等;对于配置有相机的电子设备配置,相机内参数通常已知。
相机外参数是在世界坐标系中的参数,包括相机的位置和旋转方向。
根据相机内参数和相机外参数,可以确定相机拍摄的图像中的二维像素点对应在世界坐标系中三维坐标。
3、相机的位姿:
相机拍摄图像时,相机在世界坐标系中的位置和姿态,已知相机的位姿可以得到相机的外参数。相机的位姿包括6自由度(degree of freedom,dof),其中3个位置相关的自由度用于确定相机在三维空间中的位置,3个旋转角度相关的自由度用于确定相机在三维空间中的旋转姿态。相机的位姿对应于拍摄图像时的时刻,相机在世界坐标系中的位置和姿态。对于连续拍摄获取的用于计算相机位姿的图像序列,相机和被拍摄目标之间需要存在相对移动,包括相对的位置及姿态的变化。具体,可以是被拍摄目标不动,相机移动;也可以是被拍摄目标移动,相机不动;还可以是被拍摄目标和相机都在动,且两者之间存在相对位姿变化。
下面结合附图,对本申请的实施例进行描述,显然,所描述的实施例仅仅是本申请一部分的实施例,而不是全部的实施例。本领域普通技术人员可知,随着技术的发展和新场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的实施例能够以除了在这里图示或描述的内容以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或模块的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或模块,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或模块。在本申请中出现的对步骤进行的命名或者编号,并不意味着必须按照命名或者编号所指示的时间/逻辑先后顺序执行方法流程中的步骤,已经命名或者编号的流程步骤可以根据要实现的技术目的变更执行次序,只要能达到相同或者相类似的技术效果即可。
本申请实施例提供的身高测量方法,测量的目标对象可以是脊椎动物,具体的,本申请实施例以人为例进行介绍。
本申请实施例提供的身高测量方法适用的身高测量场景有多种,下面举例进行介绍:
场景1:增强现实技术(augmented reality,AR)或虚拟现实技术(virtual reality,VR)应用中,可以通过智能终端设备进行身高测量,例如,图2a所示,使用智能手机扫描被测对象(或称目标对象、被测目标等,可简称目标)周围环境,通过同步定位与建图(simultaneous localization and mapping,SLAM)系统估计相机位姿,获取被测对象周围环境三维(3D)点云数据,通过骨骼检测算法和人脸识别算法,获得图像中的二维(2D)的骨骼关键点,即骨骼关键点的像素坐标,结合3D点云数据,得到三维空间中的至少两个骨骼关键点的三维坐标,对骨骼关键点的三维信息进行数据整合,输出一个或多个被测对象的身高数据,实现多用户多姿态下的身高测量。还可以将身高数据叠加在图像中被测对象的附近,通过智能手机的显示屏输出。后续实施例中以场景1为例对身高测量方法进行介绍。
场景2:如图2b所示,图像采集设备固定设置,被测对象走过预定位置进行图像采集, 由于相机在世界坐标系中的位置已知,对拍摄的图像进行骨骼检测,获取2D的骨骼关键点,转换为3D的骨骼关键点后,通过数据整合和计算,即可输出被测对象的身高数据。
下面,对身高测量方法进行具体介绍,请参阅图3,本申请实施例中身高测量方法的一个实施例示意图;
301、获取目标的图像和拍摄该图像的相机的位姿;
本申请中身高测量装置可以为终端,终端可以通过相机等图像采集装置获取目标的图像,相机可以配置普通单目摄像头,也可以配置双目摄像头,具体此处不做限定。相机可以是内置于终端的部件,也可以是终端以外的设备,通过通信连接,可以将图像数据传输给终端,需要说明的是,相机的内参已知。
终端还获取该图像对应的相机的位姿,可选地,终端通过单目摄像头从不同方位拍摄目标获取目标的至少两张图像,通过检测图像中的同名特征点对计算相机的位姿。或者,根据双目摄像头拍摄目标,获取相机位姿。惯性测量单元(inertial measurement unit,IMU)是测量物体三轴姿态角(或角速率)以及加速度的装置。可选地,若终端包括IMU和用于采集目标的图像的摄像头,则可以根据摄像头采集图像过程中的IMU数据获取相机的位姿。可选地,根据采集至少两张目标的图像,以及采集图像时的IMU数据,计算相机的位姿,可以理解的是,根据多张目标的图像以及IMU数据,获取的相机的位姿更准确。
可选地,目标的图像中可以包括一个或多个待测量对象。302、获取该图像中目标的骨骼关键点的像素坐标;
骨骼关键点包括骨骼关节点,可以通过现有的各类骨骼检测算法对图像进行骨骼关键点识别,获取图像中目标的至少两个骨骼关键点的像素坐标,像素坐标可以用于表示骨骼关键点在图像中的二维位置信息,像素坐标(u,v)指示点在图像中的位置。
骨骼检测算法可以对骨骼关键点进行检测,具体的,骨骼关键点检测算法有多种,例如:基于RMPE(regional multi-person pose estimation)算法、DeepCut算法等。骨骼关键点的数量例如可以是14或21等。
可选地,若目标的图像中包括多个待测量对象,可以分别获取每个待测量对象的二维骨骼关键点信息。二维骨骼关键点信息包括每个骨骼关键点在图像中的像素坐标,还包括每个骨骼关键点的标识。
可选的,目标对象可以为站立姿态,站立姿态指的是该姿态下,目标对象的全部骨骼关键点均沿重力方向排布,或者,按纵向排布,按照重力方向或纵向排布的骨骼关键点有助于提升身高测量的准确度。
可选地,目标对象可以为非站立姿态,非站立姿态指的是该姿态下,目标对象的部分骨骼关键点的像素坐标不沿重力方向排布或纵向排布,即非站立姿态下的骨骼关键点的像素坐标不全部排列于一条纵向直线。非站立姿态包括坐姿,卧姿,跪姿或其他姿态。本方案在目标对象为非站立姿态时,也能够测量身高。
303、根据相机的位姿和骨骼关键点的像素坐标,获取骨骼关键点的三维坐标;
由于相机的内参已知,根据相机的位姿,可以将图像中二维的骨骼关键点的像素坐标,转换为世界坐标系下的三维坐标,获取至少两个骨骼关键点的三维坐标,三维坐标用于表示骨骼关键点在世界坐标系中的三维位置信息,三维坐标例如为(x,y,z)。此外,由于获取 了至少两个骨骼关键点的三维坐标,为区分不同的骨骼关键点,还可以获取每个骨骼关键点的标识。
至少两个骨骼关键点的三维坐标可以用于表示至少两个骨骼关键点之间的距离信息,例如第一骨骼关键点的三维坐标为(x1,y1,z1),第二骨骼关键点的三维坐标为(x2,y2,z2),即可计算在世界坐标系下第一骨骼关键点和第二骨骼关键点之间的距离,可以理解的是,若第一骨骼关键点与第二骨骼关键点为同一段骨骼的两个端点,即为关联的骨骼关键点时,可以基于两个骨骼关键点的三维坐标计算这段骨骼的长度,也就是说,至少两个骨骼关键点之间的距离信息包括骨骼的长度的信息,从而可以用于计算目标的身高。
304、根据骨骼关键点的三维坐标确定目标的身高数据;
根据至少两个骨骼关键点的三维坐标可以获取骨骼长度,具体的,一个骨骼长度可以通过两个关联的骨骼关键点的三维坐标计算得到。可选的,根据至少三个骨骼关键点的三维坐标确定至少两段骨骼距离,通过目标的骨骼结构,由骨骼长度信息进行拼接计算,可以根据至少两段骨骼距离得到目标的身高数据。例如骨骼长度可以通过构成该骨骼的2个关节点的3D坐标之间的三维空间欧氏距离计算得到。
为获取目标的身高数据,通常需要多个骨骼长度,为便于区分不同的骨骼长度信息,还可以获取每个骨骼长度对应的骨骼的标识。骨骼的标识可以是骨骼对应的人体躯干类型(如“arm”“leg”等),用于指示不同的骨骼。骨骼的标识和骨骼关键点的标识之间存在对应关系,例如,标识为右肩的骨骼关键点和标识为右肘的骨骼关键点可以共同形成标识为右大臂的骨骼。
骨骼拼接算法用于根据骨骼长度获取身高数据,具体计算方法有多种,此处不做限定。
本申请实施例提供的身高测量方法,通过检测图像中目标的骨骼关键节点的像素坐标,然后根据相机位姿将骨骼关键点像素坐标转换到三维空间,得到骨骼关键点的三维坐标,最后根据所述至少两个骨骼关键点的三维坐标确定所述目标的身高数据。本方法通过将骨骼关键点的二维的像素坐标转换为三维坐标,不经参照物转换直接获取目标的身高数据,可以避免目标周围场景复杂时,通过参照物转换带来的测量误差,可以提高身高测量结果准确度。
请参阅图4,为本申请实施例中身高测量方法的另一个实施例示意图;
401、获取目标的图像;
终端获取至少两张目标的图像,该至少两张目标的图像由相机在不同的位姿下拍摄得到,
可选地,可以同时获取拍摄该至少两张目标的图像时的IMU数据,由于采集该至少两张目标的图像时,相机的位姿不同,IMU数据可以指示相机的移动方向和移动距离。
需要说明的是,图像中可以包括一个或多个待测量身高的对象,对于每个对象,为实现身高测量,需要获取该对象的至少两张图像。
402、确定图像序列的相机的位姿;
根据至少两张目标的图像,通过检测图像中的同名特征点对,可以计算相机的位姿。或者,根据摄像头采集图像过程中的IMU数据获取相机的位姿。或者,根据至少两张目标的图像,以及采集图像时的IMU数据,计算相机的位姿,可以理解的是,根据多张目标的图像以及IMU数据,获取的相机的位姿更准确。
终端可以获取该至少两张目标的图像中任意一张图像对应的相机的位姿。
403、获取三维点云信息;
终端获取三维点云信息,三维点云信息包括目标的可见部分在坐标系中的三维坐标。可选的,该坐标系包括世界坐标系。可选地,三维点云信息的获取方法有:激光雷达深度成像法、计算机立体视觉成像或结构光法等,具体此处不做限定。
示例性的,通过计算机立体视觉成像的方法获取三维点云信息,即对步骤401中获取的至少两张目标的图像进行特征提取和匹配,获取特征点对,根据步骤402中确定的相机位姿以及该特征点对,基于三角化算法,获取目标的图像中的像素点对应的三维点云。
示例性的,通过激光雷达深度成像法获取三维点云信息,若终端包含深度传感器,例如激光传感器等,可以直接获取3D点云信息。基于深度传感器的具体配置,输出的3D点云信息可以是稠密3D点云或半稠密3D点云。
可选地,还可以结合上述两种方式获取3D点云信息,即在目标的图像和相机的位姿计算3D点云时,点云深度直接由深度传感器获取的深度图提供,这样可以提升3D点云的准确度,此外,还可以优化相机位姿,使得相机位姿更准确。
404、对目标的图像进行人脸检测;
目标的图像可以包括一个或多个待测量身高的对象,对目标的图像进行人脸检测,可以确定一个或多个待测量对象的人脸信息。
可选地,若目标的图像中包括多个人脸信息,终端还可以将人脸检测结果呈现给用户,例如通过显示屏呈现每个目标的人脸信息,或者语音输出目标的数量等。
405、根据人脸信息进行图像分割;
基于步骤404的人脸检测结果,可以确定一个或多个待测量对象的人脸信息。
若目标的图像中包括多个人脸信息,可以对目标的图像进行图像分割,得到多个待测量对象的图像部分,多个待测量对象的图像部分可以分别用于多个待测量对象的身高测量。
需要说明的是,步骤404至步骤405为可选步骤,可以执行,也可以不执行,此处不做限定。
406、根据目标的图像进行骨骼检测,获取目标的二维骨骼关键点信息;
根据骨骼关键点检测算法,获取目标的图像的二维骨骼关键点信息,这里二维骨骼关键点信息包括骨骼关键点的像素坐标,以及与该像素坐标对应的该骨骼关键点的标识。
骨骼检测算法可以对人体骨骼关键点进行检测,具体的,骨骼关键点检测算法有多种,例如:人体骨骼关键点的数量例如可以是14或21等,示例性的,以14点为例,表1示出了人体骨骼关键点的含义及编号,通过骨骼检测算法可以输出每个人体骨骼关键点在图像中的像素坐标,并通过预设的编号进行标识。
表1:人体骨骼关键点
1/右肩 2/右肘 3/右腕 4/左肩 5/左肘
6/左腕 7/右髋 8/右膝 9/右踝 10/左髋
11/左膝 12/左踝 13/头顶 14/脖子 -
可选地,若图像中包括多个待测量对象,通过骨骼关键点检测算法可以获取每个待测量对象的二维骨骼关键点信息。可选地,若执行步骤404,对目标的图像进行骨骼检测,获取图像中所有待测量对象的人体骨骼关键点,然后确定每个待测量对象的人脸检测结果对应的 二维骨骼关键点信息。或者,对步骤405经图像分割确定的图像分别进行骨骼检测,获取每个待测量对象对应的二维骨骼关键点信息。
可选地,若图像中包括多个待测量对象,向用户显示所有待测量对象的信息,所述待测量对象的信息包括以下至少一种:待测量对象的图像信息、待测量对象的二维骨骼关键点信息和待测量对象的人脸检测结果信息;然后获取用户指令,根据用户指令,确定至少两个待测量对象中的一个或多个为进行身高测量的目标。
可选地,根据人脸检测结果对目标的二维骨骼关键点信息进行校验,具体的,由于通常二维骨骼关键点信息中头部对应骨骼关键点为单个节点,人脸检测中识别的人脸信息可以指示下颚到发际线信息,因此,可以通过人脸检测结果对头部对应的二维骨骼关键点的像素坐标进行校验,可以提升本方案身高测量结果的准确度。
可选地,若某个目标的二维骨骼关键点信息不满足第二预设条件,则向用户显示检测失败的信息,或语音提示用户检测失败的信息,或震动提示用户检测失败的信息等,具体此处不做限定。
可选地,第二预设条件可以是未检测到骨骼关键点;或者,第二预设条件可以是骨骼关键点的数量小于或等于预设阈值,例如5个或6个或7个等;或者,第二预设条件为检测到的骨骼关键点指示的骨骼数量小于或等于预设阈值,例如3个或4个等;或者,第二预设条件为检测到的骨骼关键点指示的骨骼类型及数量不满足预设要求,例如骨骼关键点指示的骨骼类型中不包括大手臂、小手臂、大腿和小腿对应的骨骼,或者,骨骼关键点指示的骨骼类型中不包括头部骨骼,或者,骨骼关键点指示的大手臂、小手臂、大腿和小腿对应的骨骼的数量小于或等于3个,等等。第二预设条件的具体内容此处不做限定。
需要说明的是,步骤404和步骤406的执行顺序不做限定。
需要说明的是,步骤402至步骤403,与步骤404至步骤406的执行顺序不做限定,可以同时执行,也可以,先执行步骤402至步骤403再执行步骤404至步骤406,或者,先执行步骤404至步骤406再执行步骤402至步骤403。
407、根据相机的位姿和三维点云信息,通过撞击检测算法获取三维骨骼关键点信息;
基于步骤402获取的相机的位姿,步骤403获取的三维点云信息,根据撞击检测(HitTest)算法,获取与2D骨骼关键点对应的转换后的3D骨骼关键点坐标。三维骨骼关键点信息包括骨骼关键点的三维坐标以及与该三维坐标对应的骨骼关键点的标识。
撞击检测(HitTest)算法的原理可以参考SLAM技术,请参阅图5,为本申请实施例中二维骨骼关键点转换为三维骨骼关键点的一个示意图。根据相机的位姿及3D点云信息,以相机光心为起点,向检测到的2D骨骼关键点方向发射虚拟射线,射线与3D点云进行撞击检测(HitTest)得到与2D骨骼关键点对应的转换后的3D骨骼关键点坐标。撞击检测的具体方法为现有技术,此处不再赘述。需要说明的是,若射线与3D点云进行撞击时,如果射线没有得到撞击结果,可在射线周围一定范围内进行搜索,采用临近的3D点插值得到最终的撞击结果。最终输出与2D骨骼关键点相对应的3D骨骼关键点信息。
可选地,直接根据目标的图像以及该图像对应的相机的位姿,将图像中的二维骨骼关键点的像素坐标转换为世界坐标系下的三维坐标,三维骨骼关键点信息包括骨骼关键点的三维坐标以及骨骼关键点的标识。
由于三维点云信息是根据多张目标的图像通过计算机立体视觉成像的方法获取,或者根据激光雷达深度成像等方式获取,因此,根据相机的位姿和三维点云信息通过撞击检测算法获取骨骼关键点的三维坐标,比直接通过相机的位姿对骨骼关键点的二维坐标进行转换得到的三维坐标的准确度更高。可以理解的是,三维点云越稠密,获取的骨骼关键点的三维坐标越准确。
408、根据三维骨骼关键点信息获取骨骼长度信息;
根据三维骨骼关键点信息获取骨骼长度信息,骨骼长度信息包括骨骼的标识和骨骼长度。
具体的,每2个骨骼关键点连接构成1个骨骼,通过3D关节点间的三维空间欧氏距离,得到各骨骼的真实长度。根据骨骼关键点的标识,可以确定骨骼的标识,骨骼的标识用于指示骨骼的类型,例如,例如根据左髋节点的三维坐标、左膝节点的三维坐标,可以获取左大腿骨骼的长度,根据左膝节点的三维坐标和左踝节点的三维坐标,可以获取左小腿骨骼的长度。需要说明的是,由于检测到的骨骼关键点的数量可能有缺失,根据三维骨骼关键点信息获取的骨骼长度信息,可能仅包括一个骨骼的长度信息,或者包括多个骨骼的长度信息,具体此处不做限定。
可选地,若骨骼长度信息满足第一预设条件,则对骨骼长度信息进行删减。第一预设条件例如是骨骼长度超出预设的阈值范围,则删减对应的骨骼长度信息,可以理解的是,不同类型的骨骼的骨骼长度的阈值范围不同,例如,大腿骨骼的骨骼长度范围与小手臂的骨骼长度范围不同;此外,基于被测目标的具体类别,例如成人、儿童或人类以外其他脊椎类动物,可以根据统计信息灵活设定不同类型被测目标的骨骼长度的阈值范围。第一预设条件还可以是对称部分的骨骼长度差异大于或等于预设阈值范围,例如左臂骨骼长度与右臂骨骼长度的比值大于或等于2,或者,小于或等于0.5,则删除手臂对应的骨骼长度信息。
409、根据三维骨骼关键点信息获取目标的姿态信息;
根据步骤408获取的有效的骨骼长度信息进行人体姿态的估计,确定目标的姿态信息,姿态信息可以采用RMPE(regional multi-person pose estimation)算法或实例分割(Mask RCNN)算法等获取,本申请对此不做限定。姿态信息可以用于指示人体的姿态,区分站姿、坐姿或卧姿等;
若骨骼长度信息中部分数据缺失,则姿态信息为不完整姿态,可能由于目标的图像中目标部分躯干被遮挡,或者骨骼长度信息中部分数据被删减等。
需要说明的是,步骤408和步骤409的执行顺序不做限定。
410、根据姿态信息和骨骼长度信息,确定目标的身高数据;
根据步骤409中目标的姿态信息,确定预设的权重参数,根据权重参数和骨骼长度信息进行加权计算,确定目标的身高数据。
可选地,若目标的姿态信息为完整姿态,也就是全部骨骼长度信息均有效,根据公式(1)进行身高加权计算:
Figure PCTCN2021073455-appb-000001
公式(1)中,n为有效的骨骼数量,L i为第i个骨骼的长度,α i为第i个骨骼长度的加权系数,β为补偿参数。可选地,不同姿态下骨骼的加权系数α i可以动态调整,或者,也可以预先存储不同姿态下各骨骼对应的加权系数。
可选地,
β=L f1+L f2=τ 1*L 12(L n-1+L n)    (2)
其中,L f1为人脸与头顶距离的补偿值,可选地,L f1的取值范围为2厘米至3厘米,L f2为脚踝节点与脚底距离的补偿值,可选地,L f2的取值范围为3厘米至5厘米。L 1为头部对应的骨骼长度,L n-1为大腿对应的骨骼长度,L n为小腿对应的骨骼长度,τ 1为人脸与头顶距离的补偿因子,τ 2为脚踝节点与脚底距离的补偿因子。
下面对身高数据计算中,对骨骼长度进行加权计算进行简要介绍,示例性的,请参阅图6和图7,分别为人体的站姿和坐姿的身高测量的示意图;
通过三维骨骼关键点信息获取的骨骼长度信息,对应于图6或图7所示的虚线段,虚线段的长度代表通过计算获取的骨骼长度信息,期望获取的身高数据需要根据图6所示的实线段计算得到,为实现虚线段长度到实线段长度的转换,本方案通过预设的加权系数进行计算,实线段的长度代表通过加权系数和骨骼长度计算得到的实际高度。示例性的,La为头部对应的长度,La'为通过加权计算得到的实际头部高度,Lb为小腿对应的长度,Lb'为通过加权计算得到的实际小腿高度。
需要说明的是,各加权系数的设置可以根据经验值进行调参。也可以使用神经网络对各加权系数进行训练,常用的模型包括:决策树、BP(back propagation)神经网络等,本申请对此不做限定。
可选地,若目标的姿态信息为不完整姿态,可以根据有效骨骼长度信息,调整骨骼的加权系数,并进行身高数据的计算。当获取的有效骨骼长度信息不完整时,即目标的姿态信息为不完整姿态,有效骨骼长度信息可能为一个或多个,若仅具有一个有效骨骼长度信息,则为该骨骼确定加权系数;若具有多个有效骨骼长度信息,则为多个有效骨骼长度信息中每个有效骨骼长度信息确定加权系数,每个有效骨骼对应的加权系数的值可能不同,具体数值此处不做限定。可以理解的是,不完整姿态下计算得到的身高数据误差增大,可选地,进行结果呈现时可以对用户进行提示当前姿态信息为不完整姿态,包括屏幕显示、声音提示或震动提示等,此处不做限定。
可选地,获取目标的身高数据之后,终端可以通过多种方式向用户输出身高数据,包括屏幕显示、声音提示或震动提示等,此处不做限定。
可选地,如图8所示,测量完成时,将测量结果以刻度线的方式显示在屏幕上的目标的图像附近。若同时测量多个待测量对象,可以将每个待测量对象的身高数据分别显示在目标的图像中每个待测量对象附近。
下面对身高测量方法的仿真实验结果进行介绍,请参阅图9a-9c,
扫描周围环境,SLAM系统计算得到被测对象对应的3D点云,3D点云分布如图9a所示。正对被测对象拍摄时,骨骼检测模块进行2D骨骼节点检测(示例中骨骼检测算法检测15个关键骨骼节点),2D检测结果如图9b所示。坐标转换模块进行2D坐标到3D坐标转换,并计算得到各3D骨骼节点的长度,实际运行时各骨骼长度计算如图9c所示。图9c分别示出了2次测量的结果,可以看到,当从不同距离、不同角度进行测量时,各骨骼测量的长度会有波动,此时需要数据整合模块进行加权,最终计算得到身高。下面对计算过程进行介绍,两次测量的各骨骼长度如表2所示,其单位为厘米(cm):
表2各骨骼长度测量值及加权系数示例
Figure PCTCN2021073455-appb-000002
各权值及身高计算过程如下:
1)由于本次身高测量为正常坐姿,且各骨骼完整,左/右肩、左/右肘、左/右腕、髋骨对应的骨骼长度不参与身高计算,因此权值设置为0;
2)头顶-脖子、脖子-髋骨、左/右膝-左/右髋、左/右踝-左/右膝由于透视关系,分别给与不同权值(该权值与相机拍摄角度、相机距离被测对象的距离均相关,详细计算过程此处略去);
3)根据头顶-脖子的骨骼长度,加权得到头部补偿值;根据左/右髋-左/右膝-左/右踝的平均骨骼长度,加权得到脚踝补偿值;
4)示例中被测对象的真实身高为172cm,本方法两次测量加权后计算得到的身高分别为175.7cm、171.3cm,误差百分比分别为2.15%和-0.42%,平均测量误差为1.28%。
上面介绍了本申请提供的身高测量方法,下面对实现该身高测量方法的终端进行介绍,请参阅图10所示,为本申请实施例中终端的一个实施例示意图。
本申请实施例中的终端可以是手机、平板、笔记本电脑或可穿戴便携设备等各种类型的终端设备,具体不做限定。
该终端包括以下模块:输入模块1001、SLAM系统1002、自动检测模块1003、坐标转换模块1004、数据整合模块1005和输出模块1006。
其中,输入模块1001获取实时的二维(2D)图像以及IMU数据;
SLAM系统1002,根据2D图像和IMU数据可以进行位姿估计,得到2D图片拍摄时对应的相机位姿,此外,对2D图像进行特征提取、特征匹配、外点剔除等处理,输出图像间的特征匹配对。结合位姿估计结果,3D点云生成模块(对应图10中的三角化地图点)基于估计得到的相机位姿和图像间的特征匹配对,使用三角化等算法,计算得到2D特征点对应的三维(3D)点。优化模块(对应图6中的地图点优化、相机位姿优化)输入相机位姿和3D点云数据,对相机位姿和3D点云进行联合优化。以上步骤执行完后,SLAM系统1002输出实时的相机位姿和3D点云数据,供其它模块使用。SALM系统的具体算法可以采用现有技术中的任意一种,本申请不做限定。
自动检测模块1003,基于实时的图像数据,使用人体分割、骨骼检测、人脸检测等算法检测每个目标的2D关键节点(即2D骨骼关键点)。
坐标转换模块1004,根据相机位姿和3D点云数据,将2D关键节点转换为3D关键节点(即3D骨骼关键点)。
数据整合模块1005,基于3D关键节点信息,进行关键节点拼接,得到被测对象的躯干信息,将3D躯干信息输入姿态检测模块进行姿态检测,补偿模块根据检测到的不同姿态叠加相应的补偿,最终得到被测用户的测量结果。
输出模块1006输出多个被测对象的身高信息。
请参阅图11a,为本申请实施例中终端的另一个实施例示意图;
该终端包括:
获取模块1101,用于获取包括目标对象的图像和拍摄所述图像时的相机的位姿;
所述获取模块1101,还用于获取所述图像中所述目标对象的至少两个骨骼关键点的像素坐标,所述像素坐标用于表示所述骨骼关键点在所述图像中的二维位置信息;
所述获取模块1101,还用于根据所述相机的位姿和所述骨骼关键点的像素坐标,获取所述骨骼关键点的三维坐标,所述三维坐标用于表示所述骨骼关键点在世界坐标系中的三维位置信息,所述至少两个骨骼关键点的三维坐标用于表示所述至少两个骨骼关键点之间的距离信息;
确定模块1102,用于根据所述至少两个骨骼关键点的三维坐标确定所述目标对象的身高数据。
可选地,所述获取模块1101,还用于获取所述目标对象的三维点云信息;
所述根据所述相机的位姿和所述骨骼关键点的像素坐标,获取所述目标对象的骨骼关键点的三维坐标具体包括:
根据所述骨骼关键点的像素坐标、所述相机的位姿和所述三维点云信息,通过撞击检测算法获取骨骼关键点的三维坐标。
可选地,所述获取模块1101具体用于:
根据至少两张从不同方位拍摄目标对象的图像,获取所述目标对象的三维点云信息。
可选地,所述获取模块1101具体用于:
获取深度传感器采集的所述目标对象的三维点云信息,所述深度传感器包括双目摄像头、激光雷达、毫米波雷达或飞行时间传感器。
可选地,所述获取模块1101具体用于:
获取至少两张从不同方位拍摄目标对象的图像,所述至少两张从不同方位拍摄目标对象的图像包括所述图像;
根据所述至少两张从不同方位拍摄目标对象的图像,获取所述相机的位姿。
可选地,所述获取模块1101具体用于:
获取至少两张从不同方位拍摄目标对象的图像,所述至少两张从不同方位拍摄目标对象的图像包括所述目标对象的图像;
获取所述至少两张从不同方位拍摄目标对象的图像对应的相机的惯性测量单元数据;
根据所述至少两张从不同方位拍摄目标对象的图像和所述惯性测量单元数据确定所述相机的位姿。
可选地,所述确定模块1102具体用于:
根据所述骨骼关键点的三维坐标,获取所述目标对象的骨骼长度和所述目标对象的姿态信息;
根据所述姿态信息,确定预设的骨骼长度的权重参数;
根据所述骨骼长度和所述权重参数,确定所述目标对象的身高数据。
可选地,所述骨骼长度包括头部的骨骼长度和腿部的骨骼长度;
所述确定模块1102具体用于:
根据所述头部的骨骼长度和预设的头部补偿参数,确定头部高度补偿值;
根据所述腿部的骨骼长度和预设的脚部补偿参数,确定脚部高度补偿值;
根据所述骨骼长度信息、所述权重参数、所述头部高度补偿值和所述脚部高度补偿值,确定所述目标对象的身高数据。
可选地,所述图像中包括至少两个目标对象;
所述设备还包括:处理模块1103,用于对所述图像进行人脸检测,并基于图像分割算法从所述骨骼关键点的像素坐标中确定所述至少两个目标对象中每个目标对象的骨骼关键点的像素坐标。
可选地,所述设备还包括:
输出模块1104,用于向用户显示所述至少两个目标对象的信息,所述至少两个目标对象的信息包括以下至少一种:所述至少两个目标对象的图像信息、标记有所述至少两个目标对象的骨骼关键点的像素坐标的图像信息和所述至少两个目标对象的人脸检测结果信息;
所述获取模块1101还用于获取用户指令,所述用户指令用于指示对所述至少两个目标对象中的一个或多个进行身高测量。
可选地,所述确定模块1102具体用于:
根据所述骨骼关键点的三维坐标,获取目标对象的骨骼长度信息;
删减满足第一预设条件的骨骼长度信息,所述第一预设条件包括骨骼长度不属于预设范围的骨骼长度信息,或对称部分的骨骼长度差异大于或等于预设阈值范围;
根据删减后的骨骼长度信息确定所述目标对象的身高数据。
可选地,所述设备还包括输出模块1104,用于:
将所述目标对象的身高数据标注在所述图像中的所述目标对象附近并向用户显示;或者,
语音播报所述目标对象的身高数据。
可选地,所述设备还包括输出模块1104,用于:
若所述目标对象的骨骼关键点不满足第二预设条件,则向用户显示检测失败的信息,或语音提示用户检测失败的信息,或震动提示用户检测失败的信息。
本申请实施例提供的终端,可以用于检测身高,通过获取模块,获取图像中目标对象的骨骼关键节点的像素坐标,并获取三维空间中骨骼关键点的三维坐标,确定模块可以根据所述至少两个骨骼关键点的三维坐标确定所述目标对象的身高数据。本设备通过将骨骼关键点的二维的像素坐标转换为三维坐标,不经参照物转换直接获取目标对象的身高数据,可以避免目标对象周围场景复杂时,通过参照物转换带来的测量误差,可以提高身高测量结果准确度。
请参阅图11b,为本申请实施例中终端的另一个实施例示意图;
本申请的终端包括传感器单元1110、计算单元1120、存储单元1140和交互单元1130。
传感器单元1110,通常包括视觉传感器(如相机),用于获取场景的2D图像信息;惯性传感器(IMU),用于获取终端的运动信息,如线加速度、角速度等;深度传感器/激光传感器(可选),用于获取场景的深度信息;
计算单元1120,通常包括CPU、GPU、缓存、寄存器等,主要用于运行操作系统,并处理本申请所涉及的各算法模块,如SLAM系统、骨骼检测、人脸识别等;
存储单元1140,主要包括内存和外部存储,主要用于用户本地和临时数据的读写等;
交互单元1130,主要包括显示屏、触摸板、扬声器、麦克风等,主要用于和用户进行交互,获取用于输入,并实施呈现算法效果等。
请参阅图12,为本申请实施例中终端的一个实施例示意图;
为便于理解,下面将对本申请实施例提供的终端100的结构进行示例说明。参见图13, 图13是本申请实施例提供的终端的结构示意图。
如图13所示,终端100可以包括处理器110,外部存储器接口120,内部存储器121,通用串行总线(universal serial bus,USB)接口130,充电管理模块140,电源管理模块141,电池142,天线1,天线2,移动通信模块150,无线通信模块160,音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,传感器模块180,按键190,马达191,指示器192,摄像头193,显示屏194,以及用户标识模块(subscriber identification module,SIM)卡接口195等。其中传感器模块180可以包括压力传感器180A,陀螺仪传感器180B,气压传感器180C,磁传感器180D,加速度传感器180E,距离传感器180F,接近光传感器180G,指纹传感器180H,温度传感器180J,触摸传感器180K,环境光传感器180L,骨传导传感器180M等。
可以理解的是,本申请实施例示意的结构并不构成对终端100的具体限定。在本申请另一些实施例中,终端100可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。
处理器110可以包括一个或多个处理单元,例如:处理器110可以包括应用处理器(application processor,AP),调制解调处理器,图形处理器(graphics processing unit,GPU),图像信号处理器(image signal processor,ISP),控制器,存储器,视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器,和/或神经网络处理器(neural-network processing unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。
其中,控制器可以是终端100的神经中枢和指挥中心。控制器可以根据指令操作码和时序信号,产生操作控制信号,完成取指令和执行指令的控制。
处理器110中还可以设置存储器,用于存储指令和数据。在一些实施例中,处理器110中的存储器为高速缓冲存储器。该存储器可以保存处理器110刚用过或循环使用的指令或数据。如果处理器110需要再次使用该指令或数据,可从所述存储器中直接调用。避免了重复存取,减少了处理器110的等待时间,因而提高了系统的效率。
在一些实施例中,处理器110可以包括一个或多个接口。接口可以包括集成电路(inter-integrated circuit,I1C)接口,集成电路内置音频(inter-integrated circuit sound,I1S)接口,脉冲编码调制(pulse code modulation,PCM)接口,通用异步收发传输器(universal asynchronous receiver/transmitter,UART)接口,移动产业处理器接口(mobile industry processor interface,MIPI),通用输入输出(general-purpose input/output,GPIO)接口,用户标识模块(subscriber identity module,SIM)接口,和/或通用串行总线(universal serial bus,USB)接口等。
可以理解的是,本申请实施例示意的各模块间的接口连接关系,只是示意性说明,并不构成对终端100的结构限定。在本申请另一些实施例中,终端100也可以采用上述实施例中不同的接口连接方式,或多种接口连接方式的组合。
充电管理模块140用于从充电器接收充电输入。其中,充电器可以是无线充电器,也可以是有线充电器。在一些有线充电的实施例中,充电管理模块140可以通过USB接口130接收有线充电器的充电输入。
电源管理模块141用于连接电池142,充电管理模块140与处理器110。电源管理模块 141接收电池142和/或充电管理模块140的输入,为处理器110,内部存储器121,外部存储器,显示屏194,摄像头193,和无线通信模块160等供电。
终端100的无线通信功能可以通过天线1,天线2,移动通信模块150,无线通信模块160,调制解调处理器以及基带处理器等实现。
在一些可行的实施方式中,终端100可以使用无线通信功能和其他设备通信。例如,终端100可以和第二电子设备通信,终端100与第二电子设备建立投屏连接,终端100输出投屏数据至第二电子设备等。其中,终端100输出的投屏数据可以为音视频数据。
天线1和天线2用于发射和接收电磁波信号。终端100中的每个天线可用于覆盖单个或多个通信频带。不同的天线还可以复用,以提高天线的利用率。例如:可以将天线1复用为无线局域网的分集天线。在另外一些实施例中,天线可以和调谐开关结合使用。
移动通信模块150可以提供应用在终端100上的包括1G/3G/4G/5G等无线通信的解决方案。移动通信模块150可以包括至少一个滤波器,开关,功率放大器,低噪声放大器(low noise amplifier,LNA)等。移动通信模块150可以由天线1接收电磁波,并对接收的电磁波进行滤波,放大等处理,传送至调制解调处理器进行解调。移动通信模块150还可以对经调制解调处理器调制后的信号放大,经天线2转为电磁波辐射出去。在一些实施例中,移动通信模块150的至少部分功能模块可以被设置于处理器110中。在一些实施例中,移动通信模块150的至少部分功能模块可以与处理器110的至少部分模块被设置在同一个器件中。
调制解调处理器可以包括调制器和解调器。其中,调制器用于将待发送的低频基带信号调制成中高频信号。解调器用于将接收的电磁波信号解调为低频基带信号。随后解调器将解调得到的低频基带信号传送至基带处理器处理。低频基带信号经基带处理器处理后,被传递给应用处理器。应用处理器通过音频设备(不限于扬声器170A,受话器170B等)输出声音信号,或通过显示屏194显示图像或视频。在一些实施例中,调制解调处理器可以是独立的器件。在另一些实施例中,调制解调处理器可以独立于处理器110,与移动通信模块150或其他功能模块设置在同一个器件中。
无线通信模块160可以提供应用在终端100上的包括无线局域网(wireless local area networks,WLAN)(如无线保真(wireless fidelity,Wi-Fi)网络),蓝牙(bluetooth,BT),全球导航卫星系统(global navigation satellite system,GNSS),调频(frequency modulation,FM),近距离无线通信技术(near field communication,NFC),红外技术(infrared,IR)等无线通信的解决方案。无线通信模块160可以是集成至少一个通信处理模块的一个或多个器件。无线通信模块160经由天线1接收电磁波,将电磁波信号调频以及滤波处理,将处理后的信号发送到处理器110。无线通信模块160还可以从处理器110接收待发送的信号,对其进行调频,放大,经天线2转为电磁波辐射出去。
在一些实施例中,终端100的天线1和移动通信模块150耦合,天线2和无线通信模块160耦合,使得终端100可以通过无线通信技术与网络以及其他设备通信。所述无线通信技术可以包括全球移动通讯系统(global system for mobile communications,GSM),通用分组无线服务(general packet radio service,GPRS),码分多址接入(code division multiple access,CDMA),宽带码分多址(wideband code division multiple access,WCDMA),时分码分多址(time-division code division multiple access,TD-SCDMA),长期演进(long term evolution,LTE), BT,GNSS,WLAN,NFC,FM,和/或IR技术等。所述GNSS可以包括全球卫星定位系统(global positioning system,GPS),全球导航卫星系统(global navigation satellite system,GLONASS),北斗卫星导航系统(beidou navigation satellite system,BDS),准天顶卫星系统(quasi-zenith satellite system,QZSS)和/或星基增强系统(satellite based augmentation systems,SBAS)。
终端100通过GPU,显示屏194,以及应用处理器等实现显示功能。GPU为图像处理的微处理器,连接显示屏194和应用处理器。GPU用于执行数学和几何计算,用于图形渲染。处理器110可包括一个或多个GPU,其执行程序指令以生成或改变显示信息。
显示屏194用于显示图像,视频等。显示屏194包括显示面板。显示面板可以采用液晶显示屏(liquid crystal display,LCD),有机发光二极管(organic light-emitting diode,OLED),有源矩阵有机发光二极体或主动矩阵有机发光二极体(active-matrix organic light emitting diode的,AMOLED),柔性发光二极管(flex light-emitting diode,FLED),Miniled,MicroLed,Micro-oLed,量子点发光二极管(quantum dot light emitting diodes,QLED)等。在一些实施例中,终端100可以包括1个或N个显示屏194,N为大于1的正整数。
在一些可行的实施方式中,显示屏194可用于显示终端100的系统输出的各个界面。终端100输出的各个界面可参考后续实施例的相关描述。
终端100可以通过ISP,摄像头193,视频编解码器,GPU,显示屏194以及应用处理器等实现拍摄功能。
ISP用于处理摄像头193反馈的数据。例如,拍照时,打开快门,光线通过镜头被传递到摄像头感光元件上,光信号转换为电信号,摄像头感光元件将所述电信号传递给ISP处理,转化为肉眼可见的图像。ISP还可以对图像的噪点,亮度,肤色进行算法优化。ISP还可以对拍摄场景的曝光,色温等参数优化。在一些实施例中,ISP可以设置在摄像头193中。
摄像头193用于捕获静态图像或视频。物体通过镜头生成光学图像投射到感光元件。感光元件可以是电荷耦合器件(charge coupled device,CCD)或互补金属氧化物半导体(complementary metal-oxide-semiconductor,CMOS)光电晶体管。感光元件把光信号转换成电信号,之后将电信号传递给ISP转换成数字图像信号。ISP将数字图像信号输出到DSP加工处理。DSP将数字图像信号转换成标准的RGB,YUV等格式的图像信号。在一些实施例中,终端100可以包括1个或N个摄像头193,N为大于1的正整数。
数字信号处理器用于处理数字信号,除了可以处理数字图像信号,还可以处理其他数字信号。
视频编解码器用于对数字视频压缩或解压缩。终端100可以支持一种或多种视频编解码器。这样,终端100可以播放或录制多种编码格式的视频,例如:动态图像专家组(moving picture experts group,MPEG)1,MPEG1,MPEG3,MPEG4等。
NPU为神经网络(neural-network,NN)计算处理器,通过借鉴生物神经网络结构,例如借鉴人脑神经元之间传递模式,对输入信息快速处理,还可以不断的自学习。通过NPU可以实现终端100的智能认知等应用,例如:图像识别,人脸识别,语音识别,文本理解等。
外部存储器接口120可以用于连接外部存储卡,例如Micro SD卡,实现扩展终端100的存储能力。外部存储卡通过外部存储器接口120与处理器110通信,实现数据存储功能。例 如将音乐,视频等文件保存在外部存储卡中。
内部存储器121可以用于存储计算机可执行程序代码,所述可执行程序代码包括指令。处理器110通过运行存储在内部存储器121的指令,从而执行终端100的各种功能应用以及数据处理。内部存储器121可以包括存储程序区和存储数据区。其中,存储程序区可存储操作系统,至少一个功能所需的应用程序(比如声音播放功能,图像播放功能等)等。存储数据区可存储终端100使用过程中所创建的数据(比如音频数据,电话本等)等。此外,内部存储器121可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件,闪存器件,通用闪存存储器(universal flash storage,UFS)等。
终端100可以通过音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,以及应用处理器等实现音频功能。例如音乐播放,录音等。在一些可行的实施方式中,音频模块170可用于播放视频对应的声音。例如,显示屏194显示视频播放画面时,音频模块170输出视频播放的声音。
音频模块170用于将数字音频信息转换成模拟音频信号输出,也用于将模拟音频输入转换为数字音频信号。
扬声器170A,也称“喇叭”,用于将音频电信号转换为声音信号。
受话器170B,也称“听筒”,用于将音频电信号转换成声音信号。
麦克风170C,也称“话筒”,“传声器”,用于将声音信号转换为电信号。
耳机接口170D用于连接有线耳机。耳机接口170D可以是USB接口130,也可以是3.5mm的开放移动电子设备平台(open mobile terminal platform,OMTP)标准接口,美国蜂窝电信工业协会(cellular telecommunications industry association of the USA,CTIA)标准接口。
压力传感器180A用于感受压力信号,可以将压力信号转换成电信号。在一些实施例中,压力传感器180A可以设置于显示屏194。陀螺仪传感器180B可以用于确定终端100的运动姿态。气压传感器180C用于测量气压。
加速度传感器180E可检测终端100在各个方向上(包括三轴或六轴)加速度的大小。当终端100静止时可检测出重力的大小及方向。还可以用于识别终端姿态,应用于横竖屏切换,计步器等应用。
距离传感器180F,用于测量距离。
环境光传感器180L用于感知环境光亮度。
指纹传感器180H用于采集指纹。
温度传感器180J用于检测温度。
触摸传感器180K,也称“触控面板”。触摸传感器180K可以设置于显示屏194,由触摸传感器180K与显示屏194组成触摸屏,也称“触控屏”。触摸传感器180K用于检测作用于其上或附近的触摸操作。触摸传感器可以将检测到的触摸操作传递给应用处理器,以确定触摸事件类型。可以通过显示屏194提供与触摸操作相关的视觉输出。在另一些实施例中,触摸传感器180K也可以设置于终端100的表面,与显示屏194所处的位置不同。
按键190包括开机键,音量键等。按键190可以是机械按键。也可以是触摸式按键。终端100可以接收按键输入,产生与终端100的用户设置以及功能控制有关的键信号输入。
马达191可以产生振动提示。
指示器192可以是指示灯,可以用于指示充电状态,电量变化,也可以用于指示消息,未接来电,通知等。
SIM卡接口195用于连接SIM卡。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。

Claims (37)

  1. 一种身高测量方法,其特征在于,包括:
    获取包括目标对象的图像和拍摄所述图像时的相机的位姿;
    获取所述图像中所述目标对象的至少两个骨骼关键点的像素坐标,所述骨骼关键点包括骨骼关节点,所述像素坐标用于表示所述骨骼关键点在所述图像中的二维位置信息;
    根据所述相机的位姿和所述至少两个骨骼关键点的像素坐标,获取所述至少两个骨骼关键点的三维坐标,所述三维坐标用于表示所述骨骼关键点在坐标系中的三维位置信息,所述至少两个骨骼关键点的三维坐标用于表示所述至少两个骨骼关键点之间的距离信息;
    根据所述至少两个骨骼关键点的三维坐标确定所述目标对象的身高数据。
  2. 根据权利要求1所述的方法,其特征在于,所述根据所述至少两个骨骼关键点的三维坐标确定所述目标对象的身高数据,具体包括:
    获取所述图像中所述目标对象的至少三个骨骼关键点的像素坐标;
    根据所述相机的位姿和所述至少三个骨骼关键点的像素坐标,获取所述至少三个骨骼关键点的三维坐标,所述三维坐标用于表示所述骨骼关键点在坐标系中的三维位置信息,所述至少三个骨骼关键点的三维坐标用于表示所述至少三个骨骼关键点之间的距离信息;
    根据所述至少三个骨骼关键点的三维坐标确定至少两段骨骼距离,根据所述至少两段骨骼距离确定所述目标对象的身高数据。
  3. 根据权利要求1或2任一项所述的方法,其特征在于,
    所述坐标系包括世界坐标系。
  4. 根据权利要求1至3任一项所述的方法,其特征在于,
    所述方法还包括:
    获取所述目标对象的三维点云信息;
    所述根据所述相机的位姿和所述骨骼关键点的像素坐标,获取所述目标对象的至少两个骨骼关键点的三维坐标具体包括:
    根据所述骨骼关键点的像素坐标、所述相机的位姿和所述三维点云信息,通过撞击检测算法获取至少两个骨骼关键点的三维坐标。
  5. 根据权利要求4所述的方法,其特征在于,所述获取所述目标对象的三维点云信息具体包括:
    根据至少两张从不同方位拍摄目标对象的图像,获取所述目标对象的三维点云信息。
  6. 根据权利要求4所述的方法,其特征在于,所述获取所述目标对象的三维点云信息具体包括:
    获取深度传感器采集的所述目标对象的三维点云信息,所述深度传感器包括双目摄像头、激光雷达、毫米波雷达或飞行时间传感器。
  7. 根据权利要求1至6中任一项所述的方法,其特征在于,
    所述获取目标对象的图像和拍摄所述图像时的相机的位姿具体包括:
    获取至少两张从不同方位拍摄目标对象的图像,所述至少两张从不同方位拍摄目标对象的图像包括所述图像;
    根据所述至少两张从不同方位拍摄目标对象的图像,获取所述相机的位姿。
  8. 根据权利要求1至6中任一项所述的方法,其特征在于,
    所述获取目标对象的图像和拍摄所述图像时的相机的位姿具体包括:
    获取至少两张从不同方位拍摄目标对象的图像,所述至少两张从不同方位拍摄目标对象 的图像包括所述目标对象的图像;
    获取所述至少两张从不同方位拍摄目标对象的图像对应的相机的惯性测量单元数据;
    根据所述至少两张从不同方位拍摄目标对象的图像和所述惯性测量单元数据确定所述相机的位姿。
  9. 根据权利要求1至8中任一项所述的方法,其特征在于,
    所述根据所述至少两个骨骼关键点的三维坐标确定所述目标对象的身高数据具体包括:
    根据所述至少两个骨骼关键点的三维坐标,获取所述目标对象的骨骼长度和所述目标对象的姿态信息;
    根据所述姿态信息,确定预设的骨骼长度的权重参数;
    根据所述骨骼长度和所述权重参数,确定所述目标对象的身高数据。
  10. 根据权利要求9所述的方法,其特征在于,
    所述骨骼长度包括头部的骨骼长度和腿部的骨骼长度;
    所述根据所述骨骼长度和所述权重参数,确定所述目标对象的身高数据具体包括:
    根据所述头部的骨骼长度和预设的头部补偿参数,确定头部高度补偿值;
    根据所述腿部的骨骼长度和预设的脚部补偿参数,确定脚部高度补偿值;
    根据所述骨骼长度信息、所述权重参数、所述头部高度补偿值和所述脚部高度补偿值,确定所述目标对象的身高数据。
  11. 根据权利要求1至10中任一项所述的方法,其特征在于,
    所述图像中包括至少两个目标对象;
    所述方法还包括:
    对所述图像进行人脸检测,并基于图像分割算法从所述骨骼关键点的像素坐标中确定所述至少两个目标对象中每个目标对象的骨骼关键点的像素坐标。
  12. 根据权利要求11所述的方法,其特征在于,
    所述方法还包括:
    向用户显示所述至少两个目标对象的信息,所述至少两个目标对象的信息包括以下至少一种:所述至少两个目标对象的图像信息、标记有所述至少两个目标对象的骨骼关键点的像素坐标的图像信息和所述至少两个目标对象的人脸检测结果信息;
    获取用户指令,所述用户指令用于指示对所述至少两个目标对象中的一个或多个进行身高测量。
  13. 根据权利要求1至12任一项所述的方法,其特征在于,所述骨骼关键点沿重力方向排布。
  14. 根据权利要求1至13任一项所述的方法,其特征在于,所述目标对象为非站立姿态。
  15. 根据权利要求1至14中任一项所述的方法,其特征在于,
    所述根据所述至少两个骨骼关键点的三维坐标确定所述目标对象的身高数据具体包括:
    根据所述至少两个骨骼关键点的三维坐标,获取目标对象的骨骼长度信息;
    删减满足第一预设条件的骨骼长度信息,所述第一预设条件包括骨骼长度不属于预设范围的骨骼长度信息,或对称部分的骨骼长度差异大于或等于预设阈值范围;
    根据删减后的骨骼长度信息确定所述目标对象的身高数据。
  16. 根据权利要求1至15中任一项所述的方法,其特征在于,
    所述方法还包括:
    将所述目标对象的身高数据标注在所述图像中的所述目标对象附近并向用户显示;或者,
    语音播报所述目标对象的身高数据。
  17. 根据权利要求1至16中任一项所述的方法,其特征在于,
    所述方法还包括:
    若所述目标对象的骨骼关键点不满足第二预设条件,则向用户显示检测失败的信息,或语音提示用户检测失败的信息,或震动提示用户检测失败的信息。
  18. 一种身高测量装置,其特征在于,包括:
    获取模块,用于获取包括目标对象的图像和拍摄所述图像时的相机的位姿;
    所述获取模块,还用于获取所述图像中所述目标对象的至少两个骨骼关键点的像素坐标,所述骨骼关键点包括骨骼关节点,所述像素坐标用于表示所述骨骼关键点在所述图像中的二维位置信息;
    所述获取模块,还用于根据所述相机的位姿和所述骨骼关键点的像素坐标,获取所述至少两个骨骼关键点的三维坐标,所述三维坐标用于表示所述骨骼关键点在坐标系中的三维位置信息,所述至少两个骨骼关键点的三维坐标用于表示所述至少两个骨骼关键点之间的距离信息;
    确定模块,用于根据所述至少两个骨骼关键点的三维坐标确定所述目标对象的身高数据。
  19. 根据权利要求18所述的装置,其特征在于,所述获取模块,具体用于:
    获取所述图像中所述目标对象的至少三个骨骼关键点的像素坐标;
    根据所述相机的位姿和所述至少三个骨骼关键点的像素坐标,获取所述至少三个骨骼关键点的三维坐标,所述三维坐标用于表示所述骨骼关键点在坐标系中的三维位置信息,所述至少三个骨骼关键点的三维坐标用于表示所述至少三个骨骼关键点之间的距离信息;
    所述确定模块,具体用于:
    根据所述至少三个骨骼关键点的三维坐标确定至少两段骨骼距离,根据所述至少两段骨骼距离确定所述目标对象的身高数据。
  20. 根据权利要求18或19任一项所述的装置,其特征在于,
    所述坐标系包括世界坐标系。
  21. 根据权利要求18至20任一项所述的装置,其特征在于,
    所述获取模块,还用于获取所述目标对象的三维点云信息;
    所述根据所述相机的位姿和所述骨骼关键点的像素坐标,获取所述目标对象的至少两个骨骼关键点的三维坐标具体包括:
    根据所述骨骼关键点的像素坐标、所述相机的位姿和所述三维点云信息,通过撞击检测算法获取至少两个骨骼关键点的三维坐标。
  22. 根据权利要求21所述的装置,其特征在于,所述获取模块具体用于:
    根据至少两张从不同方位拍摄目标对象的图像,获取所述目标对象的三维点云信息。
  23. 根据权利要求21所述的装置,其特征在于,所述获取模块具体用于:
    获取深度传感器采集的所述目标对象的三维点云信息,所述深度传感器包括双目摄像头、激光雷达、毫米波雷达或飞行时间传感器。
  24. 根据权利要求18至23中任一项所述的装置,其特征在于,所述获取模块具体用于:
    获取至少两张从不同方位拍摄目标对象的图像,所述至少两张从不同方位拍摄目标对象的图像包括所述图像;
    根据所述至少两张从不同方位拍摄目标对象的图像,获取所述相机的位姿。
  25. 根据权利要求18至23中任一项所述的装置,其特征在于,所述获取模块具体用于:
    获取至少两张从不同方位拍摄目标对象的图像,所述至少两张从不同方位拍摄目标对象的图像包括所述目标对象的图像;
    获取所述至少两张从不同方位拍摄目标对象的图像对应的相机的惯性测量单元数据;
    根据所述至少两张从不同方位拍摄目标对象的图像和所述惯性测量单元数据确定所述相机的位姿。
  26. 根据权利要求18至25中任一项所述的装置,其特征在于,所述确定模块具体用于:
    根据所述至少两个骨骼关键点的三维坐标,获取所述目标对象的骨骼长度和所述目标对象的姿态信息;
    根据所述姿态信息,确定预设的骨骼长度的权重参数;
    根据所述骨骼长度和所述权重参数,确定所述目标对象的身高数据。
  27. 根据权利要求26所述的装置,其特征在于,
    所述骨骼长度包括头部的骨骼长度和腿部的骨骼长度;
    所述确定模块具体用于:
    根据所述头部的骨骼长度和预设的头部补偿参数,确定头部高度补偿值;
    根据所述腿部的骨骼长度和预设的脚部补偿参数,确定脚部高度补偿值;
    根据所述骨骼长度信息、所述权重参数、所述头部高度补偿值和所述脚部高度补偿值,确定所述目标对象的身高数据。
  28. 根据权利要求18至27中任一项所述的装置,其特征在于,
    所述图像中包括至少两个目标对象;
    所述设备还包括:
    处理模块,用于对所述图像进行人脸检测,并基于图像分割算法从所述骨骼关键点的像素坐标中确定所述至少两个目标对象中每个目标对象的骨骼关键点的像素坐标。
  29. 根据权利要求28所述的装置,其特征在于,
    所述设备还包括:
    输出模块,用于向用户显示所述至少两个目标对象的信息,所述至少两个目标对象的信息包括以下至少一种:所述至少两个目标对象的图像信息、标记有所述至少两个目标对象的骨骼关键点的像素坐标的图像信息和所述至少两个目标对象的人脸检测结果信息;
    所述获取模块还用于获取用户指令,所述用户指令用于指示对所述至少两个目标对象中的一个或多个进行身高测量。
  30. 根据权利要求18至29中任一项所述的装置,其特征在于,所述骨骼关键点沿重力方向排布。
  31. 根据权利要求18至30任一项所述的装置,其特征在于,所述目标对象为非站立姿态。
  32. 根据权利要求18至31中任一项所述的装置,其特征在于,
    所述确定模块具体用于:
    根据所述至少两个骨骼关键点的三维坐标,获取目标对象的骨骼长度信息;
    删减满足第一预设条件的骨骼长度信息,所述第一预设条件包括骨骼长度不属于预设范围的骨骼长度信息,或对称部分的骨骼长度差异大于或等于预设阈值范围;
    根据删减后的骨骼长度信息确定所述目标对象的身高数据。
  33. 根据权利要求18至32中任一项所述的装置,其特征在于,
    所述设备还包括输出模块,用于:
    将所述目标对象的身高数据标注在所述图像中的所述目标对象附近并向用户显示;或者,语音播报所述目标对象的身高数据。
  34. 根据权利要求18至33中任一项所述的装置,其特征在于,
    所述设备还包括输出模块,用于:
    若所述目标对象的骨骼关键点不满足第二预设条件,则向用户显示检测失败的信息,或语音提示用户检测失败的信息,或震动提示用户检测失败的信息。
  35. 一种终端,其特征在于,包括:一个或多个处理器和存储器;其中,
    所述存储器中存储有计算机可读指令;
    所述一个或多个处理器用于读取所述计算机可读指令以使所述终端实现如权利要求1至17中任一项所述的方法。
  36. 一种计算机程序产品,其特征在于,当所述计算机程序产品在计算机上运行时,使得所述计算机执行如权利要求1至17任一项所述的方法。
  37. 一种计算机可读存储介质,其特征在于,包括计算机可读指令,当所述计算机可读指令在计算机上运行时,使得所述计算机执行如权利要求1至17中任一项所述的方法。
PCT/CN2021/073455 2020-07-15 2021-01-23 身高测量方法、身高测量装置和终端 WO2022012019A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
KR1020237004401A KR20230035382A (ko) 2020-07-15 2021-01-23 신장 측정 방법 및 장치, 및 단말기
JP2023501759A JP2023534664A (ja) 2020-07-15 2021-01-23 高さ測定方法および装置、ならびに端末
US18/154,508 US20230152084A1 (en) 2020-07-15 2023-01-13 Height Measurement Method and Apparatus, and Terminal

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010679662.1 2020-07-15
CN202010679662.1A CN114022532A (zh) 2020-07-15 2020-07-15 身高测量方法、身高测量装置和终端

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/154,508 Continuation US20230152084A1 (en) 2020-07-15 2023-01-13 Height Measurement Method and Apparatus, and Terminal

Publications (1)

Publication Number Publication Date
WO2022012019A1 true WO2022012019A1 (zh) 2022-01-20

Family

ID=79556083

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/073455 WO2022012019A1 (zh) 2020-07-15 2021-01-23 身高测量方法、身高测量装置和终端

Country Status (5)

Country Link
US (1) US20230152084A1 (zh)
JP (1) JP2023534664A (zh)
KR (1) KR20230035382A (zh)
CN (1) CN114022532A (zh)
WO (1) WO2022012019A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115633956A (zh) * 2022-11-08 2023-01-24 华南理工大学 一种婴儿身高自动测量方法、系统、装置和存储介质
CN117315792B (zh) * 2023-11-28 2024-03-05 湘潭荣耀智能科技有限公司 一种基于卧姿人体测量的实时调控系统
CN117434570B (zh) * 2023-12-20 2024-02-27 绘见科技(深圳)有限公司 坐标的可视化测量方法、测量设备及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100312143A1 (en) * 2009-06-03 2010-12-09 MINIMEDREAM CO., Ltd. Human body measurement system and information provision method using the same
CN102657532A (zh) * 2012-05-04 2012-09-12 深圳泰山在线科技有限公司 基于人体姿态识别的身高测量方法及装置
CN106361345A (zh) * 2016-11-29 2017-02-01 公安部第三研究所 基于摄像头标定的视频图像中人体身高测量的系统及方法
CN106780619A (zh) * 2016-11-25 2017-05-31 青岛大学 一种基于Kinect深度相机的人体尺寸测量方法
CN107256565A (zh) * 2017-05-19 2017-10-17 安徽信息工程学院 基于Kinect的人体主要体型参数的测量方法和系统
CN110717391A (zh) * 2019-09-05 2020-01-21 武汉亘星智能技术有限公司 一种基于视频图像的身高测量方法、系统、装置和介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100312143A1 (en) * 2009-06-03 2010-12-09 MINIMEDREAM CO., Ltd. Human body measurement system and information provision method using the same
CN102657532A (zh) * 2012-05-04 2012-09-12 深圳泰山在线科技有限公司 基于人体姿态识别的身高测量方法及装置
CN106780619A (zh) * 2016-11-25 2017-05-31 青岛大学 一种基于Kinect深度相机的人体尺寸测量方法
CN106361345A (zh) * 2016-11-29 2017-02-01 公安部第三研究所 基于摄像头标定的视频图像中人体身高测量的系统及方法
CN107256565A (zh) * 2017-05-19 2017-10-17 安徽信息工程学院 基于Kinect的人体主要体型参数的测量方法和系统
CN110717391A (zh) * 2019-09-05 2020-01-21 武汉亘星智能技术有限公司 一种基于视频图像的身高测量方法、系统、装置和介质

Also Published As

Publication number Publication date
CN114022532A (zh) 2022-02-08
US20230152084A1 (en) 2023-05-18
KR20230035382A (ko) 2023-03-13
JP2023534664A (ja) 2023-08-10

Similar Documents

Publication Publication Date Title
WO2022012019A1 (zh) 身高测量方法、身高测量装置和终端
WO2019223468A1 (zh) 相机姿态追踪方法、装置、设备及系统
US11782554B2 (en) Anti-mistouch method of curved screen and electronic device
CN111325842B (zh) 地图构建方法、重定位方法及装置、存储介质和电子设备
JP6338595B2 (ja) モバイルデバイスベースのテキスト検出および追跡
WO2022170863A1 (zh) 超宽带定位方法及系统
CN111445583B (zh) 增强现实处理方法及装置、存储介质和电子设备
WO2021169394A1 (zh) 基于深度的人体图像美化方法及电子设备
CN111784765B (zh) 物体测量、虚拟对象处理方法及装置、介质和电子设备
CN109685915B (zh) 一种图像处理方法、装置及移动终端
WO2020237611A1 (zh) 图像处理方法、装置、控制终端及可移动设备
WO2021088498A1 (zh) 虚拟物体显示方法以及电子设备
WO2015197026A1 (zh) 一种获取目标物体体征数据的方法、装置及终端
CN112270754A (zh) 局部网格地图构建方法及装置、可读介质和电子设备
CN115526983B (zh) 一种三维重建方法及相关设备
CN111766606A (zh) Tof深度图像的图像处理方法、装置、设备及存储介质
WO2021179186A1 (zh) 一种对焦方法、装置及电子设备
US20180028861A1 (en) Information processing device and information processing method
WO2022206494A1 (zh) 目标跟踪方法及其装置
CN113672756A (zh) 一种视觉定位方法及电子设备
WO2022161386A1 (zh) 一种位姿确定方法以及相关设备
CN110956571A (zh) 基于slam进行虚实融合的方法及电子设备
WO2021175097A1 (zh) 一种非视距物体的成像方法和电子设备
CN116468917A (zh) 图像处理方法、电子设备及存储介质
CN115423853A (zh) 一种图像配准方法和设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21842410

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2023501759

Country of ref document: JP

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 20237004401

Country of ref document: KR

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21842410

Country of ref document: EP

Kind code of ref document: A1