WO2023004682A1 - Height measurement method and apparatus, and storage medium - Google Patents

Height measurement method and apparatus, and storage medium Download PDF

Info

Publication number
WO2023004682A1
WO2023004682A1 PCT/CN2021/109248 CN2021109248W WO2023004682A1 WO 2023004682 A1 WO2023004682 A1 WO 2023004682A1 CN 2021109248 W CN2021109248 W CN 2021109248W WO 2023004682 A1 WO2023004682 A1 WO 2023004682A1
Authority
WO
WIPO (PCT)
Prior art keywords
pose
face
height
target object
electronic device
Prior art date
Application number
PCT/CN2021/109248
Other languages
French (fr)
Chinese (zh)
Inventor
焦磊磊
马超群
张旭
段超
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CN202180006425.1A priority Critical patent/CN115885316A/en
Priority to PCT/CN2021/109248 priority patent/WO2023004682A1/en
Publication of WO2023004682A1 publication Critical patent/WO2023004682A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/60Analysis of geometric attributes
    • G06T7/62Analysis of geometric attributes of area, perimeter, diameter or volume

Definitions

  • the present application relates to the technical field of image processing, in particular to a height detection method, device and storage medium.
  • the traditional height measurement method usually requires the measurer to manually operate a professional instrument, such as a height measuring instrument, which not only has low measurement efficiency, but also is inconvenient to carry and is not suitable for personal use.
  • the embodiment of the present application provides a height detection method, the method includes: performing semantic plane detection on multiple video frames collected by the image acquisition part of the electronic device, and determining the ground height in the multiple video frames Information; face detection is performed on the multiple video frames, and the face area is determined; according to the face area and the preset three-dimensional model of the face, determine the first face position of the target object in the multiple video frames pose; determine the first height of the target object according to the ground information, the first face pose, and the device pose of the electronic device.
  • the embodiment of the present application can perform semantic plane detection on multiple video frames collected by the image acquisition component of the electronic device, and determine the ground information in multiple video frames; perform face detection on multiple video frames at the same time, and determine the face area , and according to the face area and the preset three-dimensional model of the face, determine the first face pose of the target object in multiple video frames; then according to the ground information, the first face pose and the device pose of the electronic device, Determine the first height of the target object, so that height detection not only does not depend on professional equipment (such as binocular cameras, depth cameras, etc.), but also can determine the face pose of the target object through face recognition and face 3D technology, and then The height of the target object is determined through the face pose, device pose, and ground information. There is no need to manually locate the target object, and there is no need to take a complete human body image of the target object. The operation is convenient and the accuracy is high.
  • the method further includes at least one of the following: if the ground information is not detected within a preset period of time, prompting the user to take a picture ground; when the pitch angle of the electronic device indicated by the device pose does not meet the first preset condition, prompt the user to adjust the device pose; if the first face pose does not meet the second In the case of preset conditions, prompt the user to adjust the device pose and/or change the face pose of the target object; or in the case of the face area does not meet the third preset condition, prompt the user to adjust The device pose.
  • the ground information, the pitch angle of the electronic device indicated by the device pose does not meet the first preset condition, and the first human face pose within a preset period of time.
  • the user is prompted, for example, the user is prompted to take pictures of the ground, adjust the pose of the device, and change the person of the target object. Face pose, etc., so that the user can make corresponding adjustments, thereby improving the accuracy of height detection.
  • the second possible implementation of the height detection method according to the ground information, the first face pose and the The device pose of the electronic device, and determining the first height of the target object includes: determining the second height of the target object according to the ground information, the first human face pose, and the device pose ; Performing post-processing on the second height to obtain the first height, the post-processing includes Kalman filtering.
  • the second height of the target object can be determined according to the ground information, the first face pose, and the pose of the device, and post-processing such as Kalman filtering can be performed on the second height to obtain the first height of the target object , so as to improve the accuracy of height detection.
  • the method further includes: Displaying the first height on a display interface of the electronic device.
  • the first height of the target object can be displayed in the display interface of the electronic device through animation, text, augmented reality (augmented reality, AR), etc., thereby improving the height of the target object. user experience.
  • animation text, augmented reality (augmented reality, AR), etc.
  • the ground information is located in the world coordinate system
  • the first face pose is located in the camera coordinate system
  • the The ground information, the first human face pose and the device pose of the electronic device, and determining the first height of the target object include: adjusting the first human face pose according to a preset interpupillary distance reference value , to obtain the second face pose; according to the device pose, perform coordinate transformation on the second face pose to obtain the third face pose of the target object, the third face pose
  • the pose is located in the world coordinate system; according to the third face pose and the ground information, determine the first height of the target object.
  • the embodiment of the present application can adjust and transform the first face pose in the camera coordinate system to obtain the third face pose in the world coordinate system, and according to the third face pose and the ground Information to determine the first height of the target object, so that the first height of the target object can be calculated in the world coordinate system, and the accuracy of height detection can be improved.
  • the first human face pose is adjusted according to a preset interpupillary distance reference value to Obtaining the second face pose includes: determining a face size transformation coefficient according to a preset interpupillary distance reference value and the interpupillary distance value in the first face pose; The first face pose is adjusted to obtain the second face pose of the target object.
  • the face size transformation coefficient is determined through the interpupillary distance reference value and the interpupillary distance value in the first face pose, and the first face pose is adjusted according to the face size transformation coefficient, The second face pose of the target object is obtained, so that the actual size and pose of the face of the target object in the camera coordinate system can be obtained.
  • the target is determined according to the third face pose and the ground information
  • the first height of the object includes: determining the position of the top of the target object according to the third face pose; and determining the first height of the target object according to the position of the top of the head and the ground information.
  • an embodiment of the present application provides a height detection device, the height detection device is applied to electronic equipment, and includes an image acquisition component for capturing multiple video frames; a processing component configured to: A plurality of video frames is carried out semantic plane detection, and the ground information in the plurality of video frames is determined; Face detection is carried out to the plurality of video frames, and a human face area is determined; The three-dimensional model of the human face is set, and the first human face pose of the target object in the plurality of video frames is determined; according to the ground information, the first human face pose and the device pose of the electronic device, determine The first height of the target object.
  • the embodiment of the present application can perform semantic plane detection on multiple video frames collected by the image acquisition component of the electronic device, and determine the ground information in multiple video frames; perform face detection on multiple video frames at the same time, and determine the face area , and according to the face area and the preset three-dimensional model of the face, determine the first face pose of the target object in multiple video frames; then according to the ground information, the first face pose and the device pose of the electronic device, Determine the first height of the target object, so that height detection not only does not depend on professional equipment (such as binocular cameras, depth cameras, etc.), but also can determine the face pose of the target object through face recognition and face 3D technology, and then The height of the target object is determined through the face pose, device pose, and ground information. There is no need to manually locate the target object, and there is no need to take a complete human body image of the target object. The operation is convenient and the accuracy is high.
  • the processing component is further configured to at least one of the following: when the ground information is not detected within a preset period of time, Prompting the user to take pictures of the ground; when the pitch angle of the electronic device indicated by the device pose does not meet the first preset condition, prompting the user to adjust the device pose; When the second preset condition is met, prompt the user to adjust the pose of the device and/or change the face pose of the target object; or if the face area does not meet the third preset condition, The user is prompted to adjust the device pose.
  • the ground information, the pitch angle of the electronic device indicated by the device pose does not meet the first preset condition, and the first human face pose within a preset period of time.
  • prompt the user for example, prompt the user to take pictures of the ground, adjust the pose of the device, change the person of the target object Face pose, etc., so that the user can make corresponding adjustments, thereby improving the accuracy of height detection.
  • determining the first height of the target object includes: determining the second height of the target object according to the ground information, the first face pose, and the device pose. height; performing post-processing on the second height to obtain the first height, and the post-processing includes Kalman filtering.
  • the second height of the target object can be determined according to the ground information, the first face pose, and the pose of the device, and post-processing such as Kalman filtering can be performed on the second height to obtain the first height of the target object , so as to improve the accuracy of height detection.
  • the processing component further It is configured to: display the first height on the display interface of the electronic device.
  • the first height of the target object can be displayed in the display interface of the electronic device through animation, text, augmented reality (augmented reality, AR), etc., thereby improving the height of the target object. user experience.
  • animation text, augmented reality (augmented reality, AR), etc.
  • the ground information is located in the world coordinate system
  • the first face pose is located in the camera coordinate system
  • the The ground information, the first human face pose and the device pose of the electronic device, and determining the first height of the target object include: adjusting the first human face pose according to a preset interpupillary distance reference value , to obtain the second face pose; according to the device pose, perform coordinate transformation on the second face pose to obtain the third face pose of the target object, the third face pose
  • the pose is located in the world coordinate system; according to the third face pose and the ground information, determine the first height of the target object.
  • the embodiment of the present application can adjust and transform the first face pose in the camera coordinate system to obtain the third face pose in the world coordinate system, and according to the third face pose and the ground Information to determine the first height of the target object, so that the first height of the target object can be calculated in the world coordinate system, and the accuracy of height detection can be improved.
  • the first human face pose is adjusted according to a preset interpupillary distance reference value to Obtaining the second face pose includes: determining a face size transformation coefficient according to a preset interpupillary distance reference value and the interpupillary distance value in the first face pose; The first face pose is adjusted to obtain the second face pose of the target object.
  • the face size transformation coefficient is determined through the interpupillary distance reference value and the interpupillary distance value in the first face pose, and the first face pose is adjusted according to the face size transformation coefficient, The second face pose of the target object is obtained, so that the actual size and pose of the face of the target object in the camera coordinate system can be obtained.
  • the target is determined according to the third face pose and the ground information
  • the first height of the object includes: determining the position of the top of the target object according to the third face pose; and determining the first height of the target object according to the position of the top of the head and the ground information.
  • an embodiment of the present application provides a height measurement device, including: an image acquisition component, configured to acquire a plurality of video frames; a processor; a memory for storing processor-executable instructions; wherein, the processing The device is configured to implement the above-mentioned first aspect or one or more of the height detection methods in multiple possible implementation manners of the first aspect when executing the instructions.
  • the embodiment of the present application can perform semantic plane detection on multiple video frames collected by the image acquisition component of the electronic device, and determine the ground information in multiple video frames; perform face detection on multiple video frames at the same time, and determine the face area , and according to the face area and the preset three-dimensional model of the face, determine the first face pose of the target object in multiple video frames; then according to the ground information, the first face pose and the device pose of the electronic device, Determine the first height of the target object, so that height detection not only does not depend on professional equipment (such as binocular cameras, depth cameras, etc.), but also can determine the face pose of the target object through face recognition and face 3D technology, and then The height of the target object is determined through the face pose, device pose, and ground information. There is no need to manually locate the target object, and there is no need to take a complete human body image of the target object. The operation is convenient and the accuracy is high.
  • the embodiments of the present application provide a non-volatile computer-readable storage medium, on which computer program instructions are stored, and when the computer program instructions are executed by a processor, the above-mentioned first aspect or the first aspect can be realized One or several of the various possible implementations of the height detection method.
  • the embodiment of the present application can perform semantic plane detection on multiple video frames collected by the image acquisition component of the electronic device, and determine the ground information in multiple video frames; perform face detection on multiple video frames at the same time, and determine the face area , and according to the face area and the preset three-dimensional model of the face, determine the first face pose of the target object in multiple video frames; then according to the ground information, the first face pose and the device pose of the electronic device, Determine the first height of the target object, so that height detection not only does not depend on professional equipment (such as binocular cameras, depth cameras, etc.), but also can determine the face pose of the target object through face recognition and face 3D technology, and then The height of the target object is determined through the face pose, device pose, and ground information. There is no need to manually locate the target object, and there is no need to take a complete human body image of the target object. The operation is convenient and the accuracy is high.
  • the embodiments of the present application provide a computer program product, including computer readable code, or a non-volatile computer readable storage medium bearing computer readable code, when the computer readable code is stored in an electronic
  • the processor in the electronic device executes the height detection method of the first aspect or one or more of the multiple possible implementations of the first aspect.
  • the embodiment of the present application can perform semantic plane detection on multiple video frames collected by the image acquisition component of the electronic device, and determine the ground information in multiple video frames; perform face detection on multiple video frames at the same time, and determine the face area , and according to the face area and the preset three-dimensional model of the face, determine the first face pose of the target object in multiple video frames; then according to the ground information, the first face pose and the device pose of the electronic device, Determine the first height of the target object, so that height detection not only does not depend on professional equipment (such as binocular cameras, depth cameras, etc.), but also can determine the face pose of the target object through face recognition and face 3D technology, and then The height of the target object is determined through the face pose, device pose, and ground information. There is no need to manually locate the target object, and there is no need to take a complete human body image of the target object. The operation is convenient and the accuracy is high.
  • Fig. 1 shows a schematic structural diagram of an electronic device according to an embodiment of the present application.
  • Fig. 2 shows a block diagram of a software structure of an electronic device according to an embodiment of the present application.
  • Fig. 3 shows a flowchart of a height detection method according to an embodiment of the present application.
  • Fig. 4 shows a schematic diagram of a detection process of ground information according to an embodiment of the present application.
  • Fig. 5 shows a schematic diagram of a process of determining a first face pose of a target object according to an embodiment of the present application.
  • Fig. 6 shows a schematic diagram of displaying the height of a target object according to an embodiment of the present application.
  • Fig. 7 shows a schematic diagram of a processing procedure of height detection according to an embodiment of the present application.
  • Fig. 8 shows a block diagram of a height detection device according to an embodiment of the present application.
  • a binocular camera to capture a scene image, obtain the image coordinates of the human head point of the human target in the scene image, and obtain the corresponding depth of the human head point generated by the binocular camera according to the image coordinates of the human head point information, and then use the image coordinates and depth information of the cusp of the human head to calculate the coordinates of the cusp of the human head in the camera coordinate system, and according to the coordinates of the cusp of the human head in the camera coordinate system and the installation height, pitch angle, and tilt of the binocular camera angle, measure the height of the human target.
  • This technical solution not only needs to be equipped with a binocular camera (that is, it is dependent on the device), but also needs to fix the camera pose and known camera installation height, which has certain restrictions on the use scene.
  • this technical solution must take pictures of a complete human body to achieve height measurement, which has certain limitations.
  • a dense semantic map can be generated based on simultaneous localization and mapping (SLAM) technology, and then the plane semantic detection can be realized through the dense semantic map, and the internal relationship between semantics can be automatically Identify the height of the object, and then project based on the ground extraction and segmentation of the focus target, calculate the length and width of the object, and finally obtain the bounding box size (length, width, and height) of the object. Since the human body is one of the generalized objects, the height of the human body can be measured through the technical solution.
  • SLAM simultaneous localization and mapping
  • the face classifier and the face height model can be trained respectively at first, and then the image of the human target to be measured is input into the face classifier for face detection, and the face image of the human target is obtained, and the face image is input
  • the face height model is processed to obtain the height of the human target.
  • the core of this technical solution is the face height model, and the face height model obtained through machine learning not only has poor interpretability, but also relies heavily on training data, and because the relationship between face heights of different races may be different, people The generalization of the face height model is difficult, and the accuracy of the measurement results is not high.
  • height measurement is performed by manually operating an augmented reality (augmented reality, AR) scale.
  • augmented reality augmented reality
  • plane detection and SLAM technology can be used to obtain the spatial equation of the ground, and then the surveyor needs to locate the virtual anchor point at the foot of the human target (that is, the measurement object), and pull the virtual AR ruler from bottom to top to the top of the head Stop, and then use the three-dimensional (3-dimension, 3D) space coordinate system established by SLAM to obtain the length of the AR scale, that is, the height of the human target.
  • the present application provides a height detection method, which can be applied to electronic equipment.
  • the height detection method of the embodiment of the present application can perform semantic plane detection on multiple video frames collected by the image acquisition component of the electronic device, and determine the ground information in multiple video frames; simultaneously perform face detection on multiple video frames to determine Face area; according to the face area and the preset three-dimensional model of the face, determine the first face pose of the target object in multiple video frames, and according to the ground information, the first face pose and the electronic device Device pose, determine the first height of the target object.
  • the height of the target object is detected, not only does not rely on professional equipment (such as binocular cameras, depth cameras, etc.), but also can determine the face pose of the target object through face recognition and face three-dimensional technology, and then The height of the target object is determined through the face pose, device pose, and ground information. There is no need to manually locate the target object, and there is no need to take a complete human body image of the target object. The operation is convenient and the accuracy is high.
  • professional equipment such as binocular cameras, depth cameras, etc.
  • the electronic devices described in the embodiments of the present application may be touch-screen or non-touch-screen.
  • Touch-screen electronic devices can be controlled by clicking and sliding on the display screen with fingers, stylus, etc.
  • non-touch-screen Electronic devices can be connected to input devices such as a mouse, a keyboard, and a touch panel, and controlled through the input devices.
  • Fig. 1 shows a schematic structural diagram of an electronic device 100 according to an embodiment of the present application.
  • the electronic device 100 may include a cell phone, a foldable electronic device, a tablet computer, a desktop computer, a laptop computer, a handheld computer, a notebook computer, an ultra-mobile personal computer (UMPC), a netbook, a cell phone, a personal Digital assistant (personal digital assistant, PDA), augmented reality (augmented reality, AR) equipment, virtual reality (virtual reality, VR) equipment, artificial intelligence (artificial intelligence, AI) equipment, wearable equipment, vehicle equipment, smart home equipment, or at least one of smart city equipment.
  • PDA personal Digital assistant
  • augmented reality augmented reality, AR
  • VR virtual reality
  • AI artificial intelligence
  • the electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (universal serial bus, USB) connector 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, and an antenna 2 , mobile communication module 150, wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, earphone jack 170D, sensor module 180, button 190, motor 191, indicator 192, camera 193, display screen 194, and A subscriber identification module (subscriber identification module, SIM) card interface 195 and the like.
  • SIM subscriber identification module
  • the sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, an ambient light sensor 180L, bone conduction sensor 180M, etc.
  • the structure illustrated in the embodiment of the present application does not constitute a specific limitation on the electronic device 100 .
  • the electronic device 100 may include more or fewer components than shown, or combine some components, or separate some components, or arrange different components.
  • the illustrated components can be realized in hardware, software or a combination of software and hardware.
  • the processor 110 may include one or more processing units, for example: the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processing unit (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), controller, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural network processor (neural-network processing unit, NPU), etc. Wherein, different processing units may be independent devices, or may be integrated in one or more processors.
  • the processor can generate an operation control signal according to the instruction opcode and the timing signal, and complete the control of fetching and executing the instruction.
  • a memory may also be provided in the processor 110 for storing instructions and data.
  • the memory in processor 110 may be a cache memory.
  • the memory may store instructions or data used by the processor 110 or used frequently. If the processor 110 needs to use the instruction or data, it can be called directly from the memory. Repeated access is avoided, and the waiting time of the processor 110 is reduced, thus improving the efficiency of the system.
  • processor 110 may include one or more interfaces.
  • the interface may include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, a universal asynchronous transmitter (universal asynchronous receiver/transmitter, UART) interface, mobile industry processor interface (mobile industry processor interface, MIPI), general-purpose input and output (general-purpose input/output, GPIO) interface, subscriber identity module (subscriber identity module, SIM) interface, and /or universal serial bus (universal serial bus, USB) interface, etc.
  • the processor 110 may be connected to modules such as a touch sensor, an audio module, a wireless communication module, a display, and a camera through at least one of the above interfaces.
  • the interface connection relationship between the modules shown in the embodiment of the present application is only a schematic illustration, and does not constitute a structural limitation of the electronic device 100 .
  • the electronic device 100 may also adopt different interface connection manners in the foregoing embodiments, or a combination of multiple interface connection manners.
  • the electronic device 100 may implement a display function through a GPU, a display screen 194, an application processor, and the like.
  • the GPU is a microprocessor for image processing, and is connected to the display screen 194 and the application processor. GPUs are used to perform mathematical and geometric calculations for graphics rendering.
  • Processor 110 may include one or more GPUs that execute program instructions to generate or change display information.
  • the display screen 194 is used to display images, videos and the like.
  • the display screen 194 includes a display panel.
  • the display panel can be a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active matrix organic light emitting diode or an active matrix organic light emitting diode (active-matrix organic light emitting diode, AMOLED), flexible light-emitting diode (flex light-emitting diode, FLED), Miniled, MicroLed, Micro-oLed, quantum dot light emitting diodes (quantum dot light emitting diodes, QLED), etc.
  • the electronic device 100 may include one or more display screens 194 .
  • the electronic device 100 can use the camera 193, ISP, video codec, GPU, display screen 194, application processor AP, neural network processor NPU, etc. to realize functions such as photographing and video recording, that is, related functions such as image and video collection.
  • the camera 193 can be used to collect color image data of the subject. In some embodiments, the camera 193 can also be used to collect depth data of the subject. That is to say, the camera in the electronic device 100 may be a common camera that does not collect depth data, such as a monocular camera, or a professional camera capable of collecting depth data, such as a binocular camera or a depth camera. The present application does not limit the specific type of the camera 193 .
  • the ISP can be used to process the color image data collected by the camera 193 .
  • the light is transmitted to the photosensitive element of the camera through the lens, and the light signal is converted into an electrical signal, and the photosensitive element of the camera transmits the electrical signal to the ISP for processing, and converts it into an image visible to the naked eye.
  • ISP can also perform algorithm optimization on image noise, brightness, and skin color. ISP can also optimize the exposure, color temperature and other parameters of the shooting scene.
  • the electronic device 100 may include one or more cameras 193 .
  • the electronic device 100 may include one front camera and at least one rear camera.
  • the front camera can usually be used to collect the color image data of the photographer facing the display screen 194, and the rear camera can be used to collect the color image data of the object (such as people, scenery, etc.) facing the photographer.
  • the CPU, GPU, or NPU in the processor 110 may process multiple video frames collected by the camera 193 .
  • the processor 110 can detect a plurality of video frames collected by the image acquisition part (i.e., the camera 193) of the electronic device 100, and determine the ground information in the plurality of video frames; perform face detection on the plurality of video frames at the same time, Determine the face area, and determine the first face pose of the target object in multiple video frames according to the face area and the preset 3D model of the face; then according to the ground information, the first face pose and the electronic device The device pose of , determine the first height of the target object.
  • the first height of the target object may also be displayed on the display screen 194 of the electronic device 100 .
  • the gyro sensor 180B in the electronic device 100 can be used to determine the motion posture of the electronic device 100 .
  • the angular velocity of the electronic device 100 around three axes ie, x, y and z axes
  • the gyro sensor 180B can be used for image stabilization.
  • the gyro sensor 180B detects the shake angle of the electronic device 100, calculates the distance that the lens module needs to compensate according to the angle, and controls the reverse movement of the lens to offset the shake of the electronic device 100 to achieve anti-shake.
  • the gyroscope sensor 180B can also be used in scenarios such as navigation and somatosensory games.
  • the acceleration sensor 180E in the electronic device 100 can detect the acceleration of the electronic device 100 in various directions (generally three axes x, y and z). When the electronic device 100 is stationary, the magnitude and direction of gravity can be detected. It can also be used to identify the posture of electronic devices, and can be used in applications such as horizontal and vertical screen switching, pedometers, etc.
  • components such as the gyro sensor 180B and the acceleration sensor 180E of the electronic device 100 may constitute an inertial measurement unit (IMU) for measuring the device pose of the electronic device 100 .
  • IMU inertial measurement unit
  • the touch sensor 180K in the electronic device 100 is also called a "touch device”.
  • the touch sensor 180K can be disposed on the display screen 194, and the touch sensor 180K and the display screen 194 form a touch screen, also called a “touch screen”.
  • the touch sensor 180K is used to detect a touch operation on or near it.
  • the touch sensor can pass the detected touch operation to the application processor to determine the type of touch event.
  • Visual output related to the touch operation can be provided through the display screen 194 .
  • the touch sensor 180K may also be disposed on the surface of the electronic device 100 , which is different from the position of the display screen 194 .
  • the keys 190 in the electronic device 100 may include a power key, a volume key and the like.
  • the key 190 can be a mechanical key or a touch key.
  • the electronic device 100 can receive key input and generate key signal input related to user settings and function control of the electronic device 100 .
  • the camera APP may provide buttons such as start and end of the picture or video, so that the user can operate.
  • the motor 191 in the electronic device 100 can generate a vibration alert.
  • the motor 191 can be used for incoming call vibration prompts, and can also be used for touch vibration feedback.
  • touch operations applied to different applications may correspond to different vibration feedback effects.
  • the motor 191 may also correspond to different vibration feedback effects for touch operations acting on different areas of the display screen 194 .
  • Different application scenarios for example: time reminder, receiving information, alarm clock, games, taking pictures, video recording, etc.
  • the touch vibration feedback effect can also support customization.
  • the software system of the electronic device 100 may adopt a layered architecture, an event-driven architecture, a micro-kernel architecture, a micro-service architecture, or a cloud architecture.
  • the embodiment of the present application takes the Android system with a layered architecture as an example to illustrate the software structure of the electronic device 100 .
  • FIG. 2 shows a block diagram of the software structure of the electronic device 100 according to an embodiment of the present application.
  • the layered architecture divides the software into several layers, and each layer has a clear role and division of labor. Layers communicate through software interfaces.
  • the Android system is divided into five layers, which are application program layer, application program framework layer, Android runtime (Android runtime, ART) and native C/C++ library, hardware abstraction layer (Hardware Abstract Layer, HAL) and the kernel layer.
  • the application layer can consist of a series of application packages.
  • the application package may include applications such as camera, gallery, calendar, call, map, navigation, WLAN, Bluetooth, music, video, and short message.
  • the application framework layer provides an application programming interface (application programming interface, API) and a programming framework for applications in the application layer.
  • the application framework layer includes some predefined functions.
  • the application framework layer may include window managers, content providers, view systems, resource managers, notification managers, activity managers, input managers, and so on.
  • the window manager provides window management service (Window Manager Service, WMS).
  • WMS can be used for window management, window animation management, surface management and as a transfer station for input systems.
  • Content providers are used to store and retrieve data and make it accessible to applications.
  • This data can include videos, images, audio, calls made and received, browsing history and bookmarks, phonebook, etc.
  • the view system includes visual controls, such as controls for displaying text, controls for displaying pictures, and so on.
  • the view system can be used to build applications.
  • a display interface can consist of one or more views.
  • a display interface including a text message notification icon may include a view for displaying text and a view for displaying pictures.
  • the resource manager provides various resources for the application, such as localized strings, icons, pictures, layout files, video files, and so on.
  • the notification manager enables the application to display notification information in the status bar, which can be used to convey notification-type messages, and can automatically disappear after a short stay without user interaction.
  • the notification manager is used to notify the download completion, message reminder, etc.
  • the notification manager can also be a notification that appears on the top status bar of the system in the form of a chart or scroll bar text, such as a notification of an application running in the background, or a notification that appears on the screen in the form of a dialog window.
  • prompting text information in the status bar issuing a prompt sound, vibrating the electronic device, and flashing the indicator light, etc.
  • the activity manager can provide activity management service (Activity Manager Service, AMS), AMS can be used for system components (such as activities, services, content providers, broadcast receivers) to start, switch, schedule, and manage and schedule application processes .
  • Activity Manager Service Activity Manager Service
  • AMS can be used for system components (such as activities, services, content providers, broadcast receivers) to start, switch, schedule, and manage and schedule application processes .
  • the input manager can provide input management service (Input Manager Service, IMS), and IMS can be used to manage the input of the system, such as touch screen input, key input, sensor input, etc.
  • IMS fetches events from input device nodes, and distributes events to appropriate windows through interaction with WMS.
  • the Android runtime includes the core library and the Android runtime.
  • the Android runtime is responsible for converting source code into machine code.
  • the Android runtime mainly includes the use of ahead of time (ahead or time, AOT) compilation technology and just in time (just in time, JIT) compilation technology.
  • the core library is mainly used to provide basic Java class library functions, such as basic data structure, mathematics, IO, tools, database, network and other libraries.
  • the core library provides APIs for users to develop Android applications.
  • a native C/C++ library can include multiple functional modules. For example: surface manager (surface manager), media framework (Media Framework), libc, OpenGL ES, SQLite, Webkit, etc.
  • the surface manager is used to manage the display subsystem, and provides the fusion of 2D and 3D layers for multiple applications.
  • the media framework supports playback and recording of various commonly used audio and video formats, as well as still image files.
  • the media library can support a variety of audio and video encoding formats, such as: MPEG4, H.264, MP3, AAC, AMR, JPG, PNG, etc.
  • OpenGL ES provides the drawing and manipulation of 2D graphics and 3D graphics in applications.
  • SQLite provides a lightweight relational database for applications of the electronic device 100 .
  • the hardware abstraction layer runs in user space, encapsulates the kernel layer driver, and provides a call interface to the upper layer.
  • the kernel layer is the layer between hardware and software.
  • the kernel layer includes at least a display driver, a camera driver, an audio driver, and a sensor driver.
  • the user can touch the height APP icon on the display screen of the electronic device when the height detection is performed, and when the touch sensor 180K receives the touch operation, the corresponding hardware interrupt is sent to the kernel layer.
  • the kernel layer processes touch operations into original input events (including touch coordinates, time stamps of touch operations, and other information). Raw input events are stored at the kernel level.
  • the application framework layer obtains the original input event from the kernel layer, and identifies the control corresponding to the input event. Take the touch operation as a touch-click operation, and the control corresponding to the click operation is the control of the Height APP icon as an example.
  • the Height APP calls the interface of the application framework layer to start the Height APP, and then starts the camera driver by calling the kernel layer.
  • the camera 193 collects multiple video frames, that is, collects video streams through the camera 193 .
  • multiple video frames may include the target object whose height is to be detected.
  • the electronic device 100 may perform related processing such as ground detection and face detection on the plurality of video frames through the processor 110, so as to determine the height of the target object.
  • Fig. 3 shows a flowchart of a height detection method according to an embodiment of the present application.
  • the height detection method includes: step S310 , performing semantic plane detection on multiple video frames collected by the image acquisition component of the electronic device, and determining ground information in the multiple video frames.
  • the image acquisition component may be a camera of an electronic device, and the camera may be an ordinary camera that does not collect depth data, such as a monocular camera, or a professional camera capable of collecting depth data, such as a binocular camera, a depth camera, etc.
  • the multiple video frames collected by the image capture component are color (red green blue, RGB) video frames. Since multiple video frames can form a video stream, it can also be considered that the image capture component captures is an RGB video stream.
  • the multiple video frames captured by the image acquisition component may include depth data in addition to RGB image data. It should be noted that, the present application does not limit the specific type of the image acquisition component.
  • RGB video streams can be collected by the image acquisition part of the electronic device, and plane detection, semantic segmentation and other processing can be performed on the multiple collected video frames to determine the ground information in the multiple video frames.
  • the ground information may be expressed by a plane equation in space, or by other means, which is not limited in this application.
  • plane detection may first be performed on the multiple video frames collected by the image acquisition component, and position information of multiple planes in the multiple video frames may be determined.
  • position information of multiple planes in multiple video frames may be determined by using a SLAM technology.
  • three-dimensional information can be extracted from multiple video frames collected by the image acquisition component to obtain sparse point cloud data, and at the same time determine the device pose when the electronic device collects each video frame; and then according to the device when the electronic device collects each video frame Pose, through the plane fitting algorithm, performs plane fitting on the sparse point cloud data, and obtains the position information of multiple planes in multiple video frames.
  • the position information of each plane can be expressed by the plane equation in space.
  • Sparse point cloud data is obtained by extracting the three-dimensional information in multiple video frames, and according to the device pose when the electronic device collects each video frame, plane fitting is performed on the sparse point cloud data to obtain multiple
  • the position information of the plane can not only improve the processing efficiency, but also can improve the accuracy of the position information of each plane.
  • semantic segmentation can also be performed on the multiple video frames collected by the image acquisition component to obtain the semantic segmentation results of each video frame. Specifically, for any video frame among multiple video frames, semantic recognition can be performed on the video frame to identify the category of the object in the video frame, such as ground, table, wall, etc., and according to the category of the identified object , mark each pixel in the video frame, and obtain the semantic segmentation result of the video frame.
  • semantic recognition can be performed on the video frame to identify the category of the object in the video frame, such as ground, table, wall, etc., and according to the category of the identified object , mark each pixel in the video frame, and obtain the semantic segmentation result of the video frame.
  • semantic recognition can be performed on multiple planes in multiple video frames to obtain multiple semantic plane information, such as desktops and walls. , ground and other plane information, and then select the ground information from multiple semantic plane information.
  • Fig. 4 shows a schematic diagram of a detection process of ground information according to an embodiment of the present application. As shown in Figure 4, it is assumed that the height detection method of the embodiment of the present application is realized by the application program height APP on the electronic device (such as a mobile phone). Perform image acquisition to obtain multiple video frames 410 (i.e.
  • RGB video streams then perform three-dimensional information extraction on multiple video frames 410 to obtain sparse point cloud data 420, and determine the device position when the electronic device collects multiple video frames 410 pose 430; and according to the device pose 430 when the electronic device collects a plurality of video frames 410, through a plane fitting algorithm, the sparse point cloud data 420 is subjected to plane fitting to obtain the positions of multiple planes in the multiple video frames 410 Information 440.
  • semantic segmentation 450 can also be performed on multiple video frames 410 to obtain a semantic segmentation result 460; then, according to the position information 440 of multiple planes and the semantic Segmentation result 460, perform semantic recognition on multiple planes in multiple video frames 410, obtain multiple semantic plane information 470, and select ground information 480 from multiple semantic plane information 470, wherein, ground information 480 can be passed represented by plane equations in space.
  • the detection process of the ground information is exemplarily described above only by taking multiple video frames (ie, RGB video streams) collected by the image collection component as input.
  • information such as the depth data collected by the image collection component and the device pose of the electronic device collected by the inertial measurement unit IMU can also be used as input at the same time, so as to improve the accuracy of ground detection.
  • the electronic device can automatically perceive the captured scene and obtain multiple semantic plane information, and then automatically recognize the ground information, which can not only avoid manual operations by users, such as Manual operations such as manual selection of the ground by the user can improve the accuracy of ground detection.
  • the user may be prompted to take pictures of the ground.
  • the prompt information such as "please shoot the ground” and "the ground is not detected” can be broadcast to the user through voice broadcast, or the display interface of the height APP can be used to show the user "please shoot the ground” through text, animation, etc. ground", "ground not detected” and other prompt information, so that users can adjust the shooting content in time, thereby improving the efficiency of ground detection.
  • Step S320 performing face detection on the plurality of video frames to determine the face area. While determining ground information in multiple video frames, face detection can be performed on multiple video frames by means of feature extraction, key point detection, and the like. When a complete human face is detected, a human face area can be determined from multiple video frames, and an object corresponding to the human face area can be determined as a target object. Wherein, there may be one or more face regions, and one or more target objects, which are not limited in this application.
  • the preset area can be the central area of the video frame where the face area is located.
  • the area of one is set as the central area of the video frame where the face area is located. It should be noted that those skilled in the art can set the preset area of the video frame where the face area is located according to the actual situation, and this application does not limit this.
  • the user may be prompted to adjust the device pose of the electronic device through voice broadcast, text display, animation display, etc., so that the face area meets the third preset condition, Therefore, the face area is located within the preset area of the video frame where it is located, so as to improve the accuracy of height detection.
  • Step S330 Determine the first face pose of the target object in the plurality of video frames according to the face area and the preset three-dimensional model of the face.
  • the target object can be established through the pre-trained neural network according to the face area and the preset 3D model of the face (ie, the average face 3D model).
  • 3D model of human face ie, the face area and the preset 3D face model can be input into a pre-trained convolutional neural network (CNN) for registration to obtain a 3D face model of the target object.
  • CNN convolutional neural network
  • the parameters of the preset three-dimensional model of the human face for example, the constraints on the structure of the human face: pupillary distance, distance from the tip of the nose to the top of the head, etc., determine the relative distance between the three-dimensional human face model of the target object and the image acquisition component of the electronic device. position information and rotation information, and determine the position information and rotation information of the three-dimensional face model of the target object relative to the image acquisition component of the electronic device as the first face pose of the target object.
  • the rotation information may be represented by pitch angle, roll angle and yaw angle.
  • Fig. 5 shows a schematic diagram of a process of determining a first face pose of a target object according to an embodiment of the present application.
  • face detection 520 can be performed on a plurality of video frames 510 collected by the image acquisition part of the electronic device to determine the face area 530; the face area 530 and the preset three-dimensional model 540 of the face are input into the pre-training
  • the convolutional neural network CNN 550 of the target object is registered to obtain the three-dimensional face model 560 of the target object; according to the parameters of the three-dimensional face model 540 preset, determine the three-dimensional face model of the target object relative to the image acquisition part of the electronic device position information and rotation information, and determine the position information and rotation information as the first face pose 570 of the target object.
  • the 3D face model of the target object can be determined according to the face area and the preset 3D face model, and then the first face pose of the target object can be determined according to the parameters of the preset 3D face model , so that the face 3D reconstruction technology can be used to determine the first face pose of the target object, which can not only improve the processing efficiency, but also improve the accuracy of the first face pose of the target object, thereby improving the accuracy of height detection .
  • the neural network (used to generate the 3D face model of the target object, such as the convolutional neural network CNN 550) can be pre-configured according to a plurality of sample face regions and a preset 3D model of the face. to train.
  • the sample face area and the preset three-dimensional model of the face can be input into the neural network for registration to obtain the three-dimensional model of the sample face; then the three-dimensional model of the sample face is reversed Rendering (reverse render), which is to project the 3D model of the sample face into a 2D space to obtain a reverse rendering image; determine the network loss of the neural network according to the difference between each reverse rendering image and the corresponding sample face area; The network loss adjusts the network parameters of the neural network.
  • Rendering reverse render
  • the training ends and the trained neural network is obtained.
  • the training end condition may be, for example, that the training rounds of the neural network reach a preset training round threshold, the network loss of the neural network converges within a certain range, and the neural network passes verification on the verification set, etc.
  • Those skilled in the art can set the training end condition of the neural network according to the actual situation, which is not limited in this application.
  • the first face pose of the target object after determining the first face pose of the target object, it can be judged whether the first face pose satisfies the second preset condition, and the second preset condition is that in the first face pose
  • the pitch angle of the first face pose is in the preset second angle interval
  • the roll angle in the first face pose is in the preset third angle interval
  • the yaw angle in the first face pose is in the preset fourth angle interval. within the angle range.
  • the second angle interval, the third angle interval and the fourth angle interval may be the same or different. It should be noted that those skilled in the art can set the specific values of the second angle interval, the third angle interval and the fourth angle interval according to the actual situation, which is not limited in the present application.
  • the user may be prompted to adjust the device pose of the electronic device and/or change the face pose of the target object through voice broadcast, text display, animation display, etc.
  • Face pose so that the first face pose of the target object satisfies the second preset condition, so that the face of the target object faces the image acquisition component of the electronic device, that is, the video captured by the image acquisition component can
  • the face in the frame is the frontal face of the target object to improve the accuracy of height detection.
  • Step S340 Determine a first height of the target object according to the ground information, the first face pose, and the device pose of the electronic device.
  • the device pose of the electronic device is the pose when the electronic device captures the video frame where the face area is located.
  • the pitch angle of the electronic device indicated by the device pose of the electronic device satisfies a first preset condition.
  • the first preset condition is that the electronic device The pitch angle of the electronic device indicated in the device pose is within the preset first angle range.
  • the user may be prompted to adjust the device pose of the electronic device through voice broadcast, text display, animation display, etc., to avoid Excessive upward or downward shooting during video frame acquisition, thereby improving the accuracy of height detection.
  • ground information and the coordinate system of the first face pose of the target object can be determined first.
  • ground information is located in world coordinates.
  • the Y axis of the world coordinate system can be determined as the vertical direction of the real world. Since the physical information such as distance and object size in the world coordinate system is the same as the real world, in this way, The connection between the virtual world coordinate system and the real world can be established, so that the size of the object calculated in the world coordinate system is the actual size of the object in the real world.
  • the first face pose of the target object is the face pose of the target object relative to the image acquisition component of the electronic device, which is located in the camera coordinate system.
  • the image acquisition part of the electronic device is located at the origin of the coordinates. From the perspective of the 3D face model of the target object, the position of the image acquisition part of the electronic device is fixed.
  • face size adjustment and coordinate system transformation are required.
  • the first face pose of the target object when determining the first height of the target object, can be adjusted according to the preset interpupillary distance reference value to obtain the second face pose of the target object. pose, wherein the face size indicated by the first face pose is the same as the face size of the target object in the face area, the face size indicated by the second face pose is the actual face size of the target object, and the second The face pose is located in the camera coordinate system. That is to say, the first face pose of the target object can be adjusted in the camera coordinate system, so that the face size indicated by the adjusted second face pose is the actual face size of the target object.
  • the interpupillary distance value in the first human face pose of the target object can be determined, and the face size transformation coefficient can be determined according to the preset interpupillary distance reference value and the interpupillary distance value in the first human face pose;
  • the face size conversion coefficient is used to adjust the first face pose to obtain the second face pose of the target object.
  • coordinate transformation can be performed on the second face pose according to the device pose of the electronic device to obtain the third face pose of the target object, wherein the third face pose of the target object
  • the face pose is located in the world coordinate system.
  • the coordinate transformation of the second face pose P C can be performed by the following formula (1) to obtain the third face pose P w of the target object:
  • T represents the rigid body transformation matrix determined according to the device pose (R, t) of the electronic device, wherein, R represents the rotation matrix in the device pose of the electronic device, and t represents the translation matrix in the device pose of the electronic device.
  • the first height of the target object can be determined according to the third face pose and ground information.
  • the position of the top of the target object may be determined according to the pose of the third face, and then the first height of the target object may be determined according to the position of the top of the target object and ground information.
  • the position of the tip of the target's nose can be determined according to the pose of the third face, and then according to the position of the tip of the target's nose and the preset ratio between the tip of the nose to the chin and the tip of the nose to the top of the head, determine The head position of the target object, and the first height of the target object is determined according to the head position of the target object and ground information.
  • the target when there are multiple human face regions corresponding to the target object, for any human face region, in a manner similar to the above, according to the ground information in multiple video frames, the target The first face pose of the object and the device pose when the electronic device shoots the video frame where the face area is located, determine the second height of the target object; then perform Kalman filtering and averaging on multiple second heights, etc. After processing, the first height of the target object is obtained. In this way, the accuracy of height detection can be improved.
  • the first height of the target object may also be displayed on a display interface of the electronic device.
  • the first height of the target object may be displayed on the display interface of the electronic device through animation, text, augmented reality (augmented reality, AR) and other means.
  • the first height of the target object can be displayed on the display interface of the height APP.
  • the display interface of the height APP may include a real-time image interface of video frames collected by the image acquisition part of the electronic device.
  • Fig. 6 shows a schematic diagram of displaying the height of a target object according to an embodiment of the present application.
  • the user detects the height of the target object 630 through the height APP on the electronic device 600, and the display interface 610 of the height APP displays the video frames collected in real time by the image acquisition component (not shown) of the electronic device 600,
  • the height APP detects the height of the target object 630
  • its height can be displayed at the preset position above the head of the target object 630 in the display interface 610 of the height APP through the augmented reality icon 620, and the display information can be " Height: 175CM".
  • FIG. 6 only uses one target object as an example to illustrate the manner of displaying the height. It should be noted that the heights of multiple target objects may also be displayed in the above manner. Those skilled in the art can also set the display manner and display position of the height of the target object according to the actual situation, which is not limited in the present application.
  • semantic plane detection can be performed on multiple video frames collected by the image acquisition part of the electronic device, and ground information in multiple video frames can be determined; face detection can be performed on multiple video frames at the same time, Determine the face area, and determine the first face pose of the target object in multiple video frames according to the face area and the preset three-dimensional model of the face; then, according to the ground information, the first face pose and the electronic device
  • the device pose determines the first height of the target object, so that height detection not only does not depend on professional equipment (such as binocular cameras, depth cameras, etc.), but also can determine the face of the target object through face recognition and face 3D technology pose, and then determine the height of the target object through face pose, equipment pose and ground information, without manually positioning the target object, and without taking a complete human body image of the target object, which is easy to operate and highly accurate.
  • Fig. 7 shows a schematic diagram of a processing procedure of height detection according to an embodiment of the present application.
  • the user detects the height of the target object through the height APP running on the electronic device.
  • step S701 is executed, and the height APP collects multiple videos through the image acquisition part of the electronic device. frame (i.e. video stream).
  • the image acquisition component may continuously acquire video streams.
  • step S702 it can be judged whether the SLAM initialization is successful, if the SLAM is not initialized successfully, the user is prompted to move the electronic device, and step S701 is re-executed at the same time; if the SLAM initialization is successful, then execute Step S703, perform semantic plane detection on a plurality of video frames, and in step S704, judge whether ground information is detected within a preset time period.
  • step S701 If the ground information is not detected within the preset time period, the user is prompted to take pictures of the ground, and at the same time continue to execute step S701; The human face area, and in step S706, it is judged whether the human face area satisfies the third preset condition, the third preset condition is that the human face area is located in the preset area of the video frame where it is located.
  • step S707 The three-dimensional model of the human face is determined to determine the first human face pose of the target object in a plurality of video frames, and in step S708, it is judged whether the first human face pose satisfies the second preset condition, and the second preset condition is The pitch angle in the first face pose is within the preset second angle interval, the roll angle in the first face pose is within the preset third angle interval, and the yaw in the first face pose The angle is within the preset fourth angle range.
  • step S709 is executed to determine whether the pitch angle of the electronic device indicated in the device pose of the electronic device satisfies the first preset condition.
  • the first preset condition is that the pitch angle indicated in the device pose of the electronic device The pitch angle of the electronic device is within the preset first angle range.
  • step S710 determine the first height of the target object according to the ground information, the first face pose and the device pose of the electronic device; then execute step S711, through the augmented reality AR
  • the first height of the target object is displayed on the display interface of the height APP.
  • the height detection method of the embodiment of the present application can automatically identify the ground information and automatically detect the height of the target object through SLAM technology and semantic segmentation, without manual operation (such as manual clicking operation or marking the target object), and can simultaneously The height of multiple target objects is detected, thereby simplifying the height detection process and improving the height detection efficiency.
  • the embodiment of the present application acquires three-dimensional information through SLAM technology, which can also avoid contact between the electronic device and the human body of the target object, which is safe and reliable.
  • the height detection method of the embodiment of the present application performs height detection based on multiple video frames collected by a common camera (such as a monocular camera), without the need for professional equipment such as a depth camera, which reduces equipment dependence.
  • a common camera such as a monocular camera
  • the user passes Handheld devices (such as mobile phones, smart watches, etc.) can realize height detection.
  • the embodiment of the present application obtains the face pose of the target object through face recognition and face three-dimensional reconstruction technology, which is fast and accurate, not only can improve the accuracy of height detection, but also is suitable for target object movement, shooting angle changes, etc. Scenes.
  • Fig. 8 shows a block diagram of a height detection device according to an embodiment of the present application.
  • the height detection device is applied to electronic equipment, and the height detection device includes: an image acquisition component 810, configured to capture multiple video frames; a processing component 820, configured to: Carry out semantic plane detection on the frame to determine the ground information in the plurality of video frames; perform face detection to the plurality of video frames to determine the face area; according to the face image of the face area and the preset person A three-dimensional face model, determining the first face pose of the target object in the plurality of video frames; determining the target according to the ground information, the first face pose, and the device pose of the electronic device The first height of the object.
  • the processing component is further configured to at least one of the following: if the ground information is not detected within a preset period of time, prompt the user to take pictures of the ground; When the pitch angle of the electronic device does not meet the first preset condition, the user is prompted to adjust the pose of the device; when the first human face pose does not meet the second preset condition, the user is prompted Adjusting the pose of the device and/or changing the face pose of the target object; or prompting the user to adjust the pose of the device when the face area does not meet a third preset condition.
  • the determining the first height of the target object according to the ground information, the first face pose, and the device pose of the electronic device includes: according to the The ground information, the first human face pose and the device pose determine the second height of the target object; perform post-processing on the second height to obtain the first height, and the post-processing includes Kalman filter.
  • the processing component is further configured to: display the first height on a display interface of the electronic device.
  • An embodiment of the present application provides a height detection device, including: an image acquisition component for acquiring multiple video frames; a processor and a memory for storing processor-executable instructions; wherein the processor is configured to The above method is implemented when the instructions are executed.
  • An embodiment of the present application provides a non-volatile computer-readable storage medium, on which computer program instructions are stored, and when the computer program instructions are executed by a processor, the foregoing method is realized.
  • An embodiment of the present application provides a computer program product, including computer-readable codes, or a non-volatile computer-readable storage medium bearing computer-readable codes, when the computer-readable codes are stored in a processor of an electronic device When running in the electronic device, the processor in the electronic device executes the above method.
  • a computer readable storage medium may be a tangible device that can retain and store instructions for use by an instruction execution device.
  • a computer readable storage medium may be, for example, but is not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
  • Non-exhaustive list of computer-readable storage media include: portable computer disk, hard disk, random access memory (Random Access Memory, RAM), read only memory (Read Only Memory, ROM), erasable Electrically Programmable Read-Only-Memory (EPROM or flash memory), Static Random-Access Memory (Static Random-Access Memory, SRAM), Portable Compression Disk Read-Only Memory (Compact Disc Read-Only Memory, CD -ROM), Digital Video Disc (DVD), memory sticks, floppy disks, mechanically encoded devices such as punched cards or raised structures in grooves with instructions stored thereon, and any suitable combination of the foregoing .
  • RAM Random Access Memory
  • ROM read only memory
  • EPROM or flash memory erasable Electrically Programmable Read-Only-Memory
  • Static Random-Access Memory SRAM
  • Portable Compression Disk Read-Only Memory Compact Disc Read-Only Memory
  • CD -ROM Compact Disc Read-Only Memory
  • DVD Digital Video Disc
  • Computer readable program instructions or codes described herein may be downloaded from a computer readable storage medium to a respective computing/processing device, or downloaded to an external computer or external storage device over a network, such as the Internet, local area network, wide area network, and/or wireless network.
  • the network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers.
  • a network adapter card or a network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in each computing/processing device .
  • Computer program instructions for performing the operations of the present application may be assembly instructions, instruction set architecture (Instruction Set Architecture, ISA) instructions, machine instructions, machine-related instructions, microcode, firmware instructions, state setting data, or in one or more source or object code written in any combination of programming languages, including object-oriented programming languages—such as Smalltalk, C++, etc., and conventional procedural programming languages—such as the “C” language or similar programming languages.
  • Computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server implement.
  • the remote computer can be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or it can be connected to an external computer such as use an Internet service provider to connect via the Internet).
  • electronic circuits such as programmable logic circuits, field-programmable gate arrays (Field-Programmable Gate Array, FPGA) or programmable logic arrays (Programmable Logic Array, PLA), the electronic circuit can execute computer-readable program instructions, thereby realizing various aspects of the present application.
  • These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine such that when executed by the processor of the computer or other programmable data processing apparatus , producing an apparatus for realizing the functions/actions specified in one or more blocks in the flowchart and/or block diagram.
  • These computer-readable program instructions can also be stored in a computer-readable storage medium, and these instructions cause computers, programmable data processing devices and/or other devices to work in a specific way, so that the computer-readable medium storing instructions includes An article of manufacture comprising instructions for implementing various aspects of the functions/acts specified in one or more blocks in flowcharts and/or block diagrams.
  • each block in a flowchart or block diagram may represent a module, a portion of a program segment, or an instruction that includes one or more Executable instructions.
  • the functions noted in the block may occur out of the order noted in the figures. For example, two blocks in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block in the block diagrams and/or flowcharts, and combinations of blocks in the block diagrams and/or flowcharts can be implemented with hardware (such as circuits or ASIC (Application Specific Integrated Circuit, application-specific integrated circuit)), or can be implemented with a combination of hardware and software, such as firmware.
  • hardware such as circuits or ASIC (Application Specific Integrated Circuit, application-specific integrated circuit)
  • firmware such as firmware

Abstract

The present application relates to a height measurement method and apparatus, and a storage medium. The method comprises: performing semantic plane detection on a plurality of video frames, which are collected by an image collection component of an electronic device, and determining ground information in the plurality of video frames; performing facial detection on the plurality of video frames, so as to determine a facial area; determining a first facial pose of a target object in the plurality of video frames according to the facial area and a preset facial three-dimensional model; and determining a first height of the target object according to the ground information, the first facial pose and a device pose of the electronic device. The height measurement in the embodiments of the present application does not rely on a professional device, and the height measurement can be automatically performed on a target object without the need of manual positioning. The operation is convenient, and the accuracy is high.

Description

身高检测方法、装置及存储介质Height detection method, device and storage medium 技术领域technical field
本申请涉及图像处理技术领域,尤其涉及一种身高检测方法、装置及存储介质。The present application relates to the technical field of image processing, in particular to a height detection method, device and storage medium.
背景技术Background technique
传统的身高测量方法,通常需要测量人员手动操作专业仪器,例如身高测量仪等,不仅测量效率低,而且仪器携带不方便,不适合个人使用。The traditional height measurement method usually requires the measurer to manually operate a professional instrument, such as a height measuring instrument, which not only has low measurement efficiency, but also is inconvenient to carry and is not suitable for personal use.
随着图像处理技术的发展,可通过双目相机、深度相机等专业设备拍摄待测量的人体目标的深度图像,并通过对深度图像的处理,测量出人体目标的身高。但是,该方式对双目相机、深度相机等专业设备依赖较大,而且还需要拍摄完整的人体图像(即人体目标的全身图像),具有一定的局限性。With the development of image processing technology, professional equipment such as binocular cameras and depth cameras can be used to capture the depth image of the human target to be measured, and the height of the human target can be measured by processing the depth image. However, this method relies heavily on professional equipment such as binocular cameras and depth cameras, and also needs to capture a complete human body image (ie, a full-body image of a human target), which has certain limitations.
发明内容Contents of the invention
有鉴于此,提出了一种身高检测方法、装置及存储介质。In view of this, a height detection method, device and storage medium are proposed.
第一方面,本申请的实施例提供了一种身高检测方法,所述方法包括:对电子设备的图像采集部件采集的多个视频帧进行语义平面检测,确定所述多个视频帧中的地面信息;对所述多个视频帧进行人脸检测,确定人脸区域;根据所述人脸区域及预设的人脸三维模型,确定所述多个视频帧中目标对象的第一人脸位姿;根据所述地面信息、所述第一人脸位姿及所述电子设备的设备位姿,确定所述目标对象的第一身高。In the first aspect, the embodiment of the present application provides a height detection method, the method includes: performing semantic plane detection on multiple video frames collected by the image acquisition part of the electronic device, and determining the ground height in the multiple video frames Information; face detection is performed on the multiple video frames, and the face area is determined; according to the face area and the preset three-dimensional model of the face, determine the first face position of the target object in the multiple video frames pose; determine the first height of the target object according to the ground information, the first face pose, and the device pose of the electronic device.
本申请的实施例,能够对电子设备的图像采集部件采集的多个视频帧进行语义平面检测,确定多个视频帧中的地面信息;同时对多个视频帧进行人脸检测,确定人脸区域,并根据人脸区域及预设的人脸三维模型,确定多个视频帧中目标对象的第一人脸位姿;然后根据地面信息、第一人脸位姿及电子设备的设备位姿,确定目标对象的第一身高,从而使得身高检测不仅不依赖于专业设备(例如双目相机、深度相机等),而且能够通过人脸识别及人脸三维技术确定目标对象的人脸位姿,进而通过人脸位姿、设备位姿及地面信息确定目标对象的身高,无需手动对目标对象进行定位,也无需拍摄目标对象完整的人体图像,操作方便且准确性高。The embodiment of the present application can perform semantic plane detection on multiple video frames collected by the image acquisition component of the electronic device, and determine the ground information in multiple video frames; perform face detection on multiple video frames at the same time, and determine the face area , and according to the face area and the preset three-dimensional model of the face, determine the first face pose of the target object in multiple video frames; then according to the ground information, the first face pose and the device pose of the electronic device, Determine the first height of the target object, so that height detection not only does not depend on professional equipment (such as binocular cameras, depth cameras, etc.), but also can determine the face pose of the target object through face recognition and face 3D technology, and then The height of the target object is determined through the face pose, device pose, and ground information. There is no need to manually locate the target object, and there is no need to take a complete human body image of the target object. The operation is convenient and the accuracy is high.
根据第一方面,在所述身高检测方法的第一种可能的实现方式中,所述方法还包括如下至少一项:在预设时段内未检测到所述地面信息的情况下,提示用户拍摄地面;在所述设备位姿指示的所述电子设备的俯仰角不满足第一预设条件的情况下,提示用户调整所述设备位姿;在所述第一人脸位姿不满足第二预设条件的情况下,提示用户调整所述设备位姿和/或改变所述目标对象的人脸位姿;或在所述人脸区域不满足第三预设条件的情况下,提示用户调整所述设备位姿。According to the first aspect, in the first possible implementation of the height detection method, the method further includes at least one of the following: if the ground information is not detected within a preset period of time, prompting the user to take a picture ground; when the pitch angle of the electronic device indicated by the device pose does not meet the first preset condition, prompt the user to adjust the device pose; if the first face pose does not meet the second In the case of preset conditions, prompt the user to adjust the device pose and/or change the face pose of the target object; or in the case of the face area does not meet the third preset condition, prompt the user to adjust The device pose.
本申请的实施例,能够在预设时段内未检测到所述地面信息、所述设备位姿指示的所述电子设备的俯仰角不满足第一预设条件、所述第一人脸位姿不满足第二预设条 件、所述人脸区域不满足第三预设条件中的至少一种情况下,对用户进行提示,例如,提示用户拍摄地面、调整设备位姿、改变目标对象的人脸位姿等,使得用户进行相应调整,从而提高身高检测的准确性。In the embodiment of the present application, it is possible to detect that the ground information, the pitch angle of the electronic device indicated by the device pose, does not meet the first preset condition, and the first human face pose within a preset period of time. When at least one of the second preset condition and the third preset condition is not met in the face area, the user is prompted, for example, the user is prompted to take pictures of the ground, adjust the pose of the device, and change the person of the target object. Face pose, etc., so that the user can make corresponding adjustments, thereby improving the accuracy of height detection.
根据第一方面或第一方面的第一种可能的实现方式,在所述身高检测方法的第二可能的实现方式中,所述根据所述地面信息、所述第一人脸位姿及所述电子设备的设备位姿,确定所述目标对象的第一身高,包括:根据所述地面信息、所述第一人脸位姿及所述设备位姿,确定所述目标对象的第二身高;对所述第二身高进行后处理,得到所述第一身高,所述后处理包括卡尔曼滤波。According to the first aspect or the first possible implementation of the first aspect, in the second possible implementation of the height detection method, according to the ground information, the first face pose and the The device pose of the electronic device, and determining the first height of the target object includes: determining the second height of the target object according to the ground information, the first human face pose, and the device pose ; Performing post-processing on the second height to obtain the first height, the post-processing includes Kalman filtering.
本申请的实施例,能够根据地面信息、第一人脸位姿及设备位姿,确定目标对象的第二身高,并对第二身高进行卡尔曼滤波等后处理,得到目标对象的第一身高,从而能够提高身高检测的准确性。In the embodiment of the present application, the second height of the target object can be determined according to the ground information, the first face pose, and the pose of the device, and post-processing such as Kalman filtering can be performed on the second height to obtain the first height of the target object , so as to improve the accuracy of height detection.
根据第一方面或第一方面的第一种可能的实现方式或第一方面的第二种可能的实现方式,在所述身高检测方法的第三可能的实现方式中,所述方法还包括:在所述电子设备的显示界面中显示所述第一身高。According to the first aspect or the first possible implementation of the first aspect or the second possible implementation of the first aspect, in a third possible implementation of the height detection method, the method further includes: Displaying the first height on a display interface of the electronic device.
本申请的实施例,确定出目标对象的第一身高后,能够在电子设备的显示界面中通过动画、文本、增强现实(augmented reality,AR)等方式显示目标对象的第一身高,从而能够提高用户体验。In the embodiment of the present application, after the first height of the target object is determined, the first height of the target object can be displayed in the display interface of the electronic device through animation, text, augmented reality (augmented reality, AR), etc., thereby improving the height of the target object. user experience.
根据第一方面,在所述身高检测方法的第四种可能的实现方式中,所述地面信息位于世界坐标系下,所述第一人脸位姿位于相机坐标系下,所述根据所述地面信息、所述第一人脸位姿及所述电子设备的设备位姿,确定所述目标对象的第一身高,包括:根据预设的瞳距参考值调整所述第一人脸位姿,以得到第二人脸位姿;根据所述设备位姿,对所述第二人脸位姿进行坐标变换,得到所述目标对象的第三人脸位姿,所述第三人脸位姿位于世界坐标系下;根据所述第三人脸位姿及所述地面信息,确定所述目标对象的第一身高。According to the first aspect, in the fourth possible implementation of the height detection method, the ground information is located in the world coordinate system, the first face pose is located in the camera coordinate system, and the The ground information, the first human face pose and the device pose of the electronic device, and determining the first height of the target object include: adjusting the first human face pose according to a preset interpupillary distance reference value , to obtain the second face pose; according to the device pose, perform coordinate transformation on the second face pose to obtain the third face pose of the target object, the third face pose The pose is located in the world coordinate system; according to the third face pose and the ground information, determine the first height of the target object.
本申请的实施例,能够对位于相机坐标系下的第一人脸位姿进行调整及坐标变换,得到位于世界坐标系下的第三人脸位姿,并根据第三人脸位姿及地面信息,确定目标对象的第一身高,从而能够在世界坐标系下计算目标对象的第一身高,提高身高检测的准确性。The embodiment of the present application can adjust and transform the first face pose in the camera coordinate system to obtain the third face pose in the world coordinate system, and according to the third face pose and the ground Information to determine the first height of the target object, so that the first height of the target object can be calculated in the world coordinate system, and the accuracy of height detection can be improved.
根据第一方面的第四种可能的实现方式,在所述身高检测方法的第五种可能的实现方式中,所述根据预设的瞳距参考值调整所述第一人脸位姿,以得到第二人脸位姿,包括:根据预设的瞳距参考值及所述第一人脸位姿中的瞳距值,确定人脸尺寸变换系数;根据所述人脸尺寸变换系数,对所述第一人脸位姿进行调整,得到目标对象的第二人脸位姿。According to the fourth possible implementation manner of the first aspect, in the fifth possible implementation manner of the height detection method, the first human face pose is adjusted according to a preset interpupillary distance reference value to Obtaining the second face pose includes: determining a face size transformation coefficient according to a preset interpupillary distance reference value and the interpupillary distance value in the first face pose; The first face pose is adjusted to obtain the second face pose of the target object.
本申请的实施例,通过瞳距参考值及第一人脸位姿中的瞳距距值,确定人脸尺寸变换系数,并根据人脸尺寸变换系数,对第一人脸位姿进行调整,得到目标对象的第二人脸位姿,从而能够得到相机坐标系目标对象的人脸实际尺寸及位姿。In the embodiment of the present application, the face size transformation coefficient is determined through the interpupillary distance reference value and the interpupillary distance value in the first face pose, and the first face pose is adjusted according to the face size transformation coefficient, The second face pose of the target object is obtained, so that the actual size and pose of the face of the target object in the camera coordinate system can be obtained.
根据第一方面的第四种可能的实现方式,在所述身高检测方法的第六种可能的实现方式中,所述根据所述第三人脸位姿及所述地面信息,确定所述目标对象的第一身高,包括:根据所述第三人脸位姿,确定所述目标对象的头顶位置;根据所述头顶位 置及所述地面信息,确定所述目标对象的第一身高。According to the fourth possible implementation of the first aspect, in the sixth possible implementation of the height detection method, the target is determined according to the third face pose and the ground information The first height of the object includes: determining the position of the top of the target object according to the third face pose; and determining the first height of the target object according to the position of the top of the head and the ground information.
本申请的实施例,通过确定目标对象的头顶位置,并根据头顶位置及地面信息确定目标对象的第一身高,简单快速且能够提高身高检测的准确性。In the embodiment of the present application, by determining the position of the top of the target object, and determining the first height of the target object according to the position of the top of the head and ground information, it is simple and fast and can improve the accuracy of height detection.
第二方面,本申请的实施例提供了一种身高检测装置,所述身高检测装置应用于电子设备,包括图像采集部件,用于采集多个视频帧;处理部件,被配置为:对所述多个视频帧进行语义平面检测,确定所述多个视频帧中的地面信息;对所述多个视频帧进行人脸检测,确定人脸区域;根据所述人脸区域的人脸图像及预设的人脸三维模型,确定所述多个视频帧中目标对象的第一人脸位姿;根据所述地面信息、所述第一人脸位姿及所述电子设备的设备位姿,确定所述目标对象的第一身高。In a second aspect, an embodiment of the present application provides a height detection device, the height detection device is applied to electronic equipment, and includes an image acquisition component for capturing multiple video frames; a processing component configured to: A plurality of video frames is carried out semantic plane detection, and the ground information in the plurality of video frames is determined; Face detection is carried out to the plurality of video frames, and a human face area is determined; The three-dimensional model of the human face is set, and the first human face pose of the target object in the plurality of video frames is determined; according to the ground information, the first human face pose and the device pose of the electronic device, determine The first height of the target object.
本申请的实施例,能够对电子设备的图像采集部件采集的多个视频帧进行语义平面检测,确定多个视频帧中的地面信息;同时对多个视频帧进行人脸检测,确定人脸区域,并根据人脸区域及预设的人脸三维模型,确定多个视频帧中目标对象的第一人脸位姿;然后根据地面信息、第一人脸位姿及电子设备的设备位姿,确定目标对象的第一身高,从而使得身高检测不仅不依赖于专业设备(例如双目相机、深度相机等),而且能够通过人脸识别及人脸三维技术确定目标对象的人脸位姿,进而通过人脸位姿、设备位姿及地面信息确定目标对象的身高,无需手动对目标对象进行定位,也无需拍摄目标对象完整的人体图像,操作方便且准确性高。The embodiment of the present application can perform semantic plane detection on multiple video frames collected by the image acquisition component of the electronic device, and determine the ground information in multiple video frames; perform face detection on multiple video frames at the same time, and determine the face area , and according to the face area and the preset three-dimensional model of the face, determine the first face pose of the target object in multiple video frames; then according to the ground information, the first face pose and the device pose of the electronic device, Determine the first height of the target object, so that height detection not only does not depend on professional equipment (such as binocular cameras, depth cameras, etc.), but also can determine the face pose of the target object through face recognition and face 3D technology, and then The height of the target object is determined through the face pose, device pose, and ground information. There is no need to manually locate the target object, and there is no need to take a complete human body image of the target object. The operation is convenient and the accuracy is high.
根据第二方面,在所述身高检测装置的第一种可能的实现方式中,所述处理部件还被配置为如下至少一项:在预设时段内未检测到所述地面信息的情况下,提示用户拍摄地面;在所述设备位姿指示的所述电子设备的俯仰角不满足第一预设条件的情况下,提示用户调整所述设备位姿;在所述第一人脸位姿不满足第二预设条件的情况下,提示用户调整所述设备位姿和/或改变所述目标对象的人脸位姿;或在所述人脸区域不满足第三预设条件的情况下,提示用户调整所述设备位姿。According to the second aspect, in the first possible implementation manner of the height detection device, the processing component is further configured to at least one of the following: when the ground information is not detected within a preset period of time, Prompting the user to take pictures of the ground; when the pitch angle of the electronic device indicated by the device pose does not meet the first preset condition, prompting the user to adjust the device pose; When the second preset condition is met, prompt the user to adjust the pose of the device and/or change the face pose of the target object; or if the face area does not meet the third preset condition, The user is prompted to adjust the device pose.
本申请的实施例,能够在预设时段内未检测到所述地面信息、所述设备位姿指示的所述电子设备的俯仰角不满足第一预设条件、所述第一人脸位姿不满足第二预设条件或所述人脸区域不满足第三预设条件中的至少一种情况下,对用户进行提示,例如,提示用户拍摄地面、调整设备位姿、改变目标对象的人脸位姿等,使得用户进行相应调整,从而提高身高检测的准确性。In the embodiment of the present application, it is possible to detect that the ground information, the pitch angle of the electronic device indicated by the device pose, does not meet the first preset condition, and the first human face pose within a preset period of time. When at least one of the second preset condition is not met or the face area does not meet the third preset condition, prompt the user, for example, prompt the user to take pictures of the ground, adjust the pose of the device, change the person of the target object Face pose, etc., so that the user can make corresponding adjustments, thereby improving the accuracy of height detection.
根据第二方面或第二方面的第一种可能的实现方式,在所述身高检测装置的第二种可能的实现方式中,所述根据所述地面信息、所述第一人脸位姿及所述电子设备的设备位姿,确定所述目标对象的第一身高,包括:根据所述地面信息、所述第一人脸位姿及所述设备位姿,确定所述目标对象的第二身高;对所述第二身高进行后处理,得到所述第一身高,所述后处理包括卡尔曼滤波。According to the second aspect or the first possible implementation of the second aspect, in the second possible implementation of the height detection device, according to the ground information, the first face pose and The device pose of the electronic device, determining the first height of the target object includes: determining the second height of the target object according to the ground information, the first face pose, and the device pose. height; performing post-processing on the second height to obtain the first height, and the post-processing includes Kalman filtering.
本申请的实施例,能够根据地面信息、第一人脸位姿及设备位姿,确定目标对象的第二身高,并对第二身高进行卡尔曼滤波等后处理,得到目标对象的第一身高,从而能够提高身高检测的准确性。In the embodiment of the present application, the second height of the target object can be determined according to the ground information, the first face pose, and the pose of the device, and post-processing such as Kalman filtering can be performed on the second height to obtain the first height of the target object , so as to improve the accuracy of height detection.
根据第二方面或第二方面的第一种可能的实现方式或第二方面的第二种可能的实现方式,在所述身高检测装置的第三种可能的实现方式中,所述处理部件还被配置为:在所述电子设备的显示界面显示中所述第一身高。According to the second aspect or the first possible implementation of the second aspect or the second possible implementation of the second aspect, in a third possible implementation of the height detection device, the processing component further It is configured to: display the first height on the display interface of the electronic device.
本申请的实施例,确定出目标对象的第一身高后,能够在电子设备的显示界面中通过动画、文本、增强现实(augmented reality,AR)等方式显示目标对象的第一身高,从而能够提高用户体验。In the embodiment of the present application, after the first height of the target object is determined, the first height of the target object can be displayed in the display interface of the electronic device through animation, text, augmented reality (augmented reality, AR), etc., thereby improving the height of the target object. user experience.
根据第二方面,在所述身高检测装置的第四种可能的实现方式中,所述地面信息位于世界坐标系下,所述第一人脸位姿位于相机坐标系下,所述根据所述地面信息、所述第一人脸位姿及所述电子设备的设备位姿,确定所述目标对象的第一身高,包括:根据预设的瞳距参考值调整所述第一人脸位姿,以得到第二人脸位姿;根据所述设备位姿,对所述第二人脸位姿进行坐标变换,得到所述目标对象的第三人脸位姿,所述第三人脸位姿位于世界坐标系下;根据所述第三人脸位姿及所述地面信息,确定所述目标对象的第一身高。According to the second aspect, in the fourth possible implementation of the height detection device, the ground information is located in the world coordinate system, the first face pose is located in the camera coordinate system, and the The ground information, the first human face pose and the device pose of the electronic device, and determining the first height of the target object include: adjusting the first human face pose according to a preset interpupillary distance reference value , to obtain the second face pose; according to the device pose, perform coordinate transformation on the second face pose to obtain the third face pose of the target object, the third face pose The pose is located in the world coordinate system; according to the third face pose and the ground information, determine the first height of the target object.
本申请的实施例,能够对位于相机坐标系下的第一人脸位姿进行调整及坐标变换,得到位于世界坐标系下的第三人脸位姿,并根据第三人脸位姿及地面信息,确定目标对象的第一身高,从而能够在世界坐标系下计算目标对象的第一身高,提高身高检测的准确性。The embodiment of the present application can adjust and transform the first face pose in the camera coordinate system to obtain the third face pose in the world coordinate system, and according to the third face pose and the ground Information to determine the first height of the target object, so that the first height of the target object can be calculated in the world coordinate system, and the accuracy of height detection can be improved.
根据第二方面的第四种可能的实现方式,在所述身高检测装置的第五种可能的实现方式中,所述根据预设的瞳距参考值调整所述第一人脸位姿,以得到第二人脸位姿,包括:根据预设的瞳距参考值及所述第一人脸位姿中的瞳距值,确定人脸尺寸变换系数;根据所述人脸尺寸变换系数,对所述第一人脸位姿进行调整,得到目标对象的第二人脸位姿。According to the fourth possible implementation manner of the second aspect, in the fifth possible implementation manner of the height detection device, the first human face pose is adjusted according to a preset interpupillary distance reference value to Obtaining the second face pose includes: determining a face size transformation coefficient according to a preset interpupillary distance reference value and the interpupillary distance value in the first face pose; The first face pose is adjusted to obtain the second face pose of the target object.
本申请的实施例,通过瞳距参考值及第一人脸位姿中的瞳距距值,确定人脸尺寸变换系数,并根据人脸尺寸变换系数,对第一人脸位姿进行调整,得到目标对象的第二人脸位姿,从而能够得到相机坐标系目标对象的人脸实际尺寸及位姿。In the embodiment of the present application, the face size transformation coefficient is determined through the interpupillary distance reference value and the interpupillary distance value in the first face pose, and the first face pose is adjusted according to the face size transformation coefficient, The second face pose of the target object is obtained, so that the actual size and pose of the face of the target object in the camera coordinate system can be obtained.
根据第二方面的第四种可能的实现方式,在所述身高检测装置的第六种可能的实现方式中,所述根据所述第三人脸位姿及所述地面信息,确定所述目标对象的第一身高,包括:根据所述第三人脸位姿,确定所述目标对象的头顶位置;根据所述头顶位置及所述地面信息,确定所述目标对象的第一身高。According to the fourth possible implementation of the second aspect, in the sixth possible implementation of the height detection device, the target is determined according to the third face pose and the ground information The first height of the object includes: determining the position of the top of the target object according to the third face pose; and determining the first height of the target object according to the position of the top of the head and the ground information.
本申请的实施例,通过确定目标对象的头顶位置,并根据头顶位置及地面信息确定目标对象的第一身高,简单快速且能够提高身高检测的准确性。In the embodiment of the present application, by determining the position of the top of the target object, and determining the first height of the target object according to the position of the top of the head and ground information, it is simple and fast and can improve the accuracy of height detection.
第三方面,本申请的实施例提供了一种身高测量装置,包括:图像采集部件,用于采集多个视频帧;处理器;用于存储处理器可执行指令的存储器;其中,所述处理器被配置为执行所述指令时实现上述第一方面或者第一方面的多种可能的实现方式中的一种或几种的身高检测方法。In a third aspect, an embodiment of the present application provides a height measurement device, including: an image acquisition component, configured to acquire a plurality of video frames; a processor; a memory for storing processor-executable instructions; wherein, the processing The device is configured to implement the above-mentioned first aspect or one or more of the height detection methods in multiple possible implementation manners of the first aspect when executing the instructions.
本申请的实施例,能够对电子设备的图像采集部件采集的多个视频帧进行语义平面检测,确定多个视频帧中的地面信息;同时对多个视频帧进行人脸检测,确定人脸区域,并根据人脸区域及预设的人脸三维模型,确定多个视频帧中目标对象的第一人脸位姿;然后根据地面信息、第一人脸位姿及电子设备的设备位姿,确定目标对象的第一身高,从而使得身高检测不仅不依赖于专业设备(例如双目相机、深度相机等),而且能够通过人脸识别及人脸三维技术确定目标对象的人脸位姿,进而通过人脸位姿、设备位姿及地面信息确定目标对象的身高,无需手动对目标对象进行定位,也无需拍 摄目标对象完整的人体图像,操作方便且准确性高。The embodiment of the present application can perform semantic plane detection on multiple video frames collected by the image acquisition component of the electronic device, and determine the ground information in multiple video frames; perform face detection on multiple video frames at the same time, and determine the face area , and according to the face area and the preset three-dimensional model of the face, determine the first face pose of the target object in multiple video frames; then according to the ground information, the first face pose and the device pose of the electronic device, Determine the first height of the target object, so that height detection not only does not depend on professional equipment (such as binocular cameras, depth cameras, etc.), but also can determine the face pose of the target object through face recognition and face 3D technology, and then The height of the target object is determined through the face pose, device pose, and ground information. There is no need to manually locate the target object, and there is no need to take a complete human body image of the target object. The operation is convenient and the accuracy is high.
第四方面,本申请的实施例提供了一种非易失性计算机可读存储介质,其上存储有计算机程序指令,所述计算机程序指令被处理器执行时实现上述第一方面或者第一方面的多种可能的实现方式中的一种或几种的身高检测方法。In the fourth aspect, the embodiments of the present application provide a non-volatile computer-readable storage medium, on which computer program instructions are stored, and when the computer program instructions are executed by a processor, the above-mentioned first aspect or the first aspect can be realized One or several of the various possible implementations of the height detection method.
本申请的实施例,能够对电子设备的图像采集部件采集的多个视频帧进行语义平面检测,确定多个视频帧中的地面信息;同时对多个视频帧进行人脸检测,确定人脸区域,并根据人脸区域及预设的人脸三维模型,确定多个视频帧中目标对象的第一人脸位姿;然后根据地面信息、第一人脸位姿及电子设备的设备位姿,确定目标对象的第一身高,从而使得身高检测不仅不依赖于专业设备(例如双目相机、深度相机等),而且能够通过人脸识别及人脸三维技术确定目标对象的人脸位姿,进而通过人脸位姿、设备位姿及地面信息确定目标对象的身高,无需手动对目标对象进行定位,也无需拍摄目标对象完整的人体图像,操作方便且准确性高。The embodiment of the present application can perform semantic plane detection on multiple video frames collected by the image acquisition component of the electronic device, and determine the ground information in multiple video frames; perform face detection on multiple video frames at the same time, and determine the face area , and according to the face area and the preset three-dimensional model of the face, determine the first face pose of the target object in multiple video frames; then according to the ground information, the first face pose and the device pose of the electronic device, Determine the first height of the target object, so that height detection not only does not depend on professional equipment (such as binocular cameras, depth cameras, etc.), but also can determine the face pose of the target object through face recognition and face 3D technology, and then The height of the target object is determined through the face pose, device pose, and ground information. There is no need to manually locate the target object, and there is no need to take a complete human body image of the target object. The operation is convenient and the accuracy is high.
第五方面,本申请的实施例提供了一种计算机程序产品,包括计算机可读代码,或者承载有计算机可读代码的非易失性计算机可读存储介质,当所述计算机可读代码在电子设备中运行时,所述电子设备中的处理器执行上述第一方面或者第一方面的多种可能的实现方式中的一种或几种的身高检测方法。In the fifth aspect, the embodiments of the present application provide a computer program product, including computer readable code, or a non-volatile computer readable storage medium bearing computer readable code, when the computer readable code is stored in an electronic When running in the device, the processor in the electronic device executes the height detection method of the first aspect or one or more of the multiple possible implementations of the first aspect.
本申请的实施例,能够对电子设备的图像采集部件采集的多个视频帧进行语义平面检测,确定多个视频帧中的地面信息;同时对多个视频帧进行人脸检测,确定人脸区域,并根据人脸区域及预设的人脸三维模型,确定多个视频帧中目标对象的第一人脸位姿;然后根据地面信息、第一人脸位姿及电子设备的设备位姿,确定目标对象的第一身高,从而使得身高检测不仅不依赖于专业设备(例如双目相机、深度相机等),而且能够通过人脸识别及人脸三维技术确定目标对象的人脸位姿,进而通过人脸位姿、设备位姿及地面信息确定目标对象的身高,无需手动对目标对象进行定位,也无需拍摄目标对象完整的人体图像,操作方便且准确性高。The embodiment of the present application can perform semantic plane detection on multiple video frames collected by the image acquisition component of the electronic device, and determine the ground information in multiple video frames; perform face detection on multiple video frames at the same time, and determine the face area , and according to the face area and the preset three-dimensional model of the face, determine the first face pose of the target object in multiple video frames; then according to the ground information, the first face pose and the device pose of the electronic device, Determine the first height of the target object, so that height detection not only does not depend on professional equipment (such as binocular cameras, depth cameras, etc.), but also can determine the face pose of the target object through face recognition and face 3D technology, and then The height of the target object is determined through the face pose, device pose, and ground information. There is no need to manually locate the target object, and there is no need to take a complete human body image of the target object. The operation is convenient and the accuracy is high.
本申请的这些和其他方面在以下(多个)实施例的描述中会更加简明易懂。These and other aspects of the present application will be made more apparent in the following description of the embodiment(s).
附图说明Description of drawings
包含在说明书中并且构成说明书的一部分的附图与说明书一起示出了本申请的示例性实施例、特征和方面,并且用于解释本申请的原理。The accompanying drawings, which are incorporated in and constitute a part of the specification, illustrate exemplary embodiments, features, and aspects of the application and, together with the specification, serve to explain the principles of the application.
图1示出根据本申请一实施例的电子设备的结构示意图。Fig. 1 shows a schematic structural diagram of an electronic device according to an embodiment of the present application.
图2示出根据本申请一实施例的电子设备的软件结构框图。Fig. 2 shows a block diagram of a software structure of an electronic device according to an embodiment of the present application.
图3示出根据本申请一实施例的身高检测方法的流程图。Fig. 3 shows a flowchart of a height detection method according to an embodiment of the present application.
图4示出根据本申请一实施例的地面信息的检测过程的示意图。Fig. 4 shows a schematic diagram of a detection process of ground information according to an embodiment of the present application.
图5示出根据本申请一实施例的目标对象的第一人脸位姿的确定过程的示意图。Fig. 5 shows a schematic diagram of a process of determining a first face pose of a target object according to an embodiment of the present application.
图6示出根据本申请一实施例的目标对象的身高显示的示意图。Fig. 6 shows a schematic diagram of displaying the height of a target object according to an embodiment of the present application.
图7示出根据本申请一实施例的身高检测的处理过程的示意图。Fig. 7 shows a schematic diagram of a processing procedure of height detection according to an embodiment of the present application.
图8示出根据本申请一实施例的身高检测装置的框图。Fig. 8 shows a block diagram of a height detection device according to an embodiment of the present application.
具体实施方式Detailed ways
以下将参考附图详细说明本申请的各种示例性实施例、特征和方面。附图中相同的附图标记表示功能相同或相似的元件。尽管在附图中示出了实施例的各种方面,但是除非特别指出,不必按比例绘制附图。Various exemplary embodiments, features, and aspects of the present application will be described in detail below with reference to the accompanying drawings. The same reference numbers in the figures indicate functionally identical or similar elements. While various aspects of the embodiments are shown in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.
在这里专用的词“示例性”意为“用作例子、实施例或说明性”。这里作为“示例性”所说明的任何实施例不必解释为优于或好于其它实施例。The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration." Any embodiment described herein as "exemplary" is not necessarily to be construed as superior or better than other embodiments.
另外,为了更好的说明本申请,在下文的具体实施方式中给出了众多的具体细节。本领域技术人员应当理解,没有某些具体细节,本申请同样可以实施。在一些实例中,对于本领域技术人员熟知的方法、手段、元件和电路未作详细描述,以便于凸显本申请的主旨。In addition, in order to better illustrate the present application, numerous specific details are given in the following specific implementation manners. It will be understood by those skilled in the art that the present application may be practiced without certain of the specific details. In some instances, methods, means, components and circuits well known to those skilled in the art have not been described in detail in order to highlight the gist of the present application.
在相关技术中,对人体的身高进行测量时,通常需要使用双目相机、深度相机等专业设备,对设备依赖较大,而且需要拍摄完整的人体图像,具有一定的局限性。例如,在一些技术方案中,利用双目相机拍摄场景图像,获取场景图像中人体目标的人头尖点的图像坐标,并根据人头尖点的图像坐标获取双目相机生成的人头尖点对应的深度信息,进而利用人头尖点的图像坐标和深度信息,计算人头尖点在摄像机坐标系下的坐标,并根据人头尖点在摄像机坐标系下的坐标和双目相机的安装高度、俯仰角、倾斜角,测量人体目标的身高。In related technologies, when measuring the height of a human body, it is usually necessary to use professional equipment such as binocular cameras and depth cameras, which is highly dependent on the equipment and needs to capture a complete human body image, which has certain limitations. For example, in some technical solutions, use a binocular camera to capture a scene image, obtain the image coordinates of the human head point of the human target in the scene image, and obtain the corresponding depth of the human head point generated by the binocular camera according to the image coordinates of the human head point information, and then use the image coordinates and depth information of the cusp of the human head to calculate the coordinates of the cusp of the human head in the camera coordinate system, and according to the coordinates of the cusp of the human head in the camera coordinate system and the installation height, pitch angle, and tilt of the binocular camera angle, measure the height of the human target.
该技术方案不仅需要配备双目相机(即对设备存在依赖),同时还需要固定相机位姿及已知相机安装高度,对使用场景有一定限制。此外,该技术方案还必须将完整的人体拍摄下来才能实现身高测量,具有一定的局限性。This technical solution not only needs to be equipped with a binocular camera (that is, it is dependent on the device), but also needs to fix the camera pose and known camera installation height, which has certain restrictions on the use scene. In addition, this technical solution must take pictures of a complete human body to achieve height measurement, which has certain limitations.
再例如,在一些技术方案中,可以基于语义同步定位与建图(simultaneous localization and mapping,SLAM)技术生成稠密语义地图,然后通过稠密语义地图实现平面语义检测,并根据语义之间的内在关系自动识别物体高度,再基于地面提取和焦点目标分割后投影,计算物体长度和宽度,最终得到物体的包围盒大小(长宽高)。由于人体属于广义物体中的一种,因此,可通过该技术方案测量人体身高。For another example, in some technical solutions, a dense semantic map can be generated based on simultaneous localization and mapping (SLAM) technology, and then the plane semantic detection can be realized through the dense semantic map, and the internal relationship between semantics can be automatically Identify the height of the object, and then project based on the ground extraction and segmentation of the focus target, calculate the length and width of the object, and finally obtain the bounding box size (length, width, and height) of the object. Since the human body is one of the generalized objects, the height of the human body can be measured through the technical solution.
但是,基于语义SLAM技术生成稠密语义地图需要依赖深度相机(即对设备存在依赖),而且该技术方案对于表面平行于地面的物体测量效果较好,而人体外形复杂,头顶没有明显的平面,测量准确性不高。此外,该技术方案需要拍摄目标物体的全貌,包括物体顶部,否则无法重建出物体完整的轮廓,而对于人体身高测量的场景,就需要拍摄者从更高的角度拍摄人体目标,操作不便且具有一定的局限性。However, the generation of dense semantic maps based on semantic SLAM technology needs to rely on depth cameras (that is, there is a dependence on equipment), and this technical solution is better for measuring objects whose surface is parallel to the ground, while the shape of the human body is complex and there is no obvious plane on the top of the head. Not very accurate. In addition, this technical solution needs to capture the whole picture of the target object, including the top of the object, otherwise the complete outline of the object cannot be reconstructed. For the scene of human body height measurement, the photographer needs to shoot the human target from a higher angle, which is inconvenient to operate and has Certain limitations.
而随着人工智能(artificial intelligence,AI)的发展,一些技术方案采用了通过机器学习得到的人脸身高模型对人体的身高进行测量。例如,可首先分别训练人脸分类器和人脸身高模型,然后将待测量的人体目标的图像输入人脸分类器进行人脸检测,得到人体目标的人脸图像,并将该人脸图像输入人脸身高模型进行处理,得到人体目标的身高。With the development of artificial intelligence (AI), some technical solutions use the face height model obtained by machine learning to measure the height of the human body. For example, the face classifier and the face height model can be trained respectively at first, and then the image of the human target to be measured is input into the face classifier for face detection, and the face image of the human target is obtained, and the face image is input The face height model is processed to obtain the height of the human target.
然而,该技术方案的核心是人脸身高模型,而通过机器学习得到的人脸身高模型,不仅可解释性差,对训练数据依赖大,而且由于不同人种的人脸身高关系可能存在不同,人脸身高模型的泛化难度大,测量结果准确度不高。However, the core of this technical solution is the face height model, and the face height model obtained through machine learning not only has poor interpretability, but also relies heavily on training data, and because the relationship between face heights of different races may be different, people The generalization of the face height model is difficult, and the accuracy of the measurement results is not high.
在另一些技术方案中,通过手动操作增强现实(augmented reality,AR)标尺来进行身高测量。例如,可利用平面检测及SLAM技术得到地面的空间方程,然后测量人 员需要将虚拟锚点定位到人体目标(即测量对象)的脚下,并从下向上拉动虚拟的AR标尺,拉动到头顶位置后停下,然后通过SLAM建立起来的三维(3-dimension,3D)空间坐标系,得到AR标尺的长度,即人体目标的身高。In some other technical solutions, height measurement is performed by manually operating an augmented reality (augmented reality, AR) scale. For example, plane detection and SLAM technology can be used to obtain the spatial equation of the ground, and then the surveyor needs to locate the virtual anchor point at the foot of the human target (that is, the measurement object), and pull the virtual AR ruler from bottom to top to the top of the head Stop, and then use the three-dimensional (3-dimension, 3D) space coordinate system established by SLAM to obtain the length of the AR scale, that is, the height of the human target.
但是,该技术方案不仅需要手动参与,测量效率较低,而且虚拟锚点是在二维(2-dimension,2D)图像上进行点击,通过射线投影投射到3D平面上的,由于物体遮挡、手动操作误差等原因,会导致虚拟锚点看似定位在人体目标的脚下但实际差距较大的现象,即虚拟锚点定位不准确,从而导致身高测量结果不准确。However, this technical solution not only requires manual participation, but the measurement efficiency is low, and the virtual anchor point is clicked on a two-dimensional (2-dimension, 2D) image and projected onto the 3D plane through ray projection. Operation error and other reasons will lead to the phenomenon that the virtual anchor point seems to be positioned at the foot of the human target, but the actual gap is relatively large, that is, the virtual anchor point is not positioned accurately, resulting in inaccurate height measurement results.
为了解决上述技术问题,本申请提供了一种身高检测方法,该方法可应用于电子设备。本申请实施例的身高检测方法,能够对电子设备的图像采集部件采集的多个视频帧进行语义平面检测,确定多个视频帧中的地面信息;同时对多个视频帧进行人脸检测,确定人脸区域;根据所述人脸区域及预设的人脸三维模型,确定多个视频帧中目标对象的第一人脸位姿,并根据地面信息、第一人脸位姿及电子设备的设备位姿,确定目标对象的第一身高。In order to solve the above technical problems, the present application provides a height detection method, which can be applied to electronic equipment. The height detection method of the embodiment of the present application can perform semantic plane detection on multiple video frames collected by the image acquisition component of the electronic device, and determine the ground information in multiple video frames; simultaneously perform face detection on multiple video frames to determine Face area; according to the face area and the preset three-dimensional model of the face, determine the first face pose of the target object in multiple video frames, and according to the ground information, the first face pose and the electronic device Device pose, determine the first height of the target object.
通过这种方式对目标对象的身高进行检测,不仅不依赖于专业设备(例如双目相机、深度相机等),而且能够通过人脸识别及人脸三维技术确定目标对象的人脸位姿,进而通过人脸位姿、设备位姿及地面信息确定目标对象的身高,无需手动对目标对象进行定位,也无需拍摄目标对象完整的人体图像,操作方便且准确性高。In this way, the height of the target object is detected, not only does not rely on professional equipment (such as binocular cameras, depth cameras, etc.), but also can determine the face pose of the target object through face recognition and face three-dimensional technology, and then The height of the target object is determined through the face pose, device pose, and ground information. There is no need to manually locate the target object, and there is no need to take a complete human body image of the target object. The operation is convenient and the accuracy is high.
本申请实施例所述的电子设备可以是触屏的、也可以是非触屏的,触屏的电子设备可以通过手指、触控笔等在显示屏幕上点击、滑动等方式进行控制,非触屏的电子设备可以连接鼠标、键盘、触控面板等输入设备,通过输入设备进行控制。The electronic devices described in the embodiments of the present application may be touch-screen or non-touch-screen. Touch-screen electronic devices can be controlled by clicking and sliding on the display screen with fingers, stylus, etc., and non-touch-screen Electronic devices can be connected to input devices such as a mouse, a keyboard, and a touch panel, and controlled through the input devices.
图1示出根据本申请一实施例的电子设备100的结构示意图。电子设备100可以包括手机、可折叠电子设备、平板电脑、桌面型计算机、膝上型计算机、手持计算机、笔记本电脑、超级移动个人计算机(ultra-mobile personal computer,UMPC)、上网本、蜂窝电话、个人数字助理(personal digital assistant,PDA)、增强现实(augmented reality,AR)设备、虚拟现实(virtual reality,VR)设备、人工智能(artificial intelligence,AI)设备、可穿戴式设备、车载设备、智能家居设备、或智慧城市设备中的至少一种。本申请实施例对该电子设备100的具体类型不作特殊限制。Fig. 1 shows a schematic structural diagram of an electronic device 100 according to an embodiment of the present application. The electronic device 100 may include a cell phone, a foldable electronic device, a tablet computer, a desktop computer, a laptop computer, a handheld computer, a notebook computer, an ultra-mobile personal computer (UMPC), a netbook, a cell phone, a personal Digital assistant (personal digital assistant, PDA), augmented reality (augmented reality, AR) equipment, virtual reality (virtual reality, VR) equipment, artificial intelligence (artificial intelligence, AI) equipment, wearable equipment, vehicle equipment, smart home equipment, or at least one of smart city equipment. The embodiment of the present application does not specifically limit the specific type of the electronic device 100 .
电子设备100可以包括处理器110,外部存储器接口120,内部存储器121,通用串行总线(universal serial bus,USB)接头130,充电管理模块140,电源管理模块141,电池142,天线1,天线2,移动通信模块150,无线通信模块160,音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,传感器模块180,按键190,马达191,指示器192,摄像头193,显示屏194,以及用户标识模块(subscriber identification module,SIM)卡接口195等。其中传感器模块180可以包括压力传感器180A,陀螺仪传感器180B,气压传感器180C,磁传感器180D,加速度传感器180E,距离传感器180F,接近光传感器180G,指纹传感器180H,温度传感器180J,触摸传感器180K,环境光传感器180L,骨传导传感器180M等。The electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (universal serial bus, USB) connector 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, and an antenna 2 , mobile communication module 150, wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, earphone jack 170D, sensor module 180, button 190, motor 191, indicator 192, camera 193, display screen 194, and A subscriber identification module (subscriber identification module, SIM) card interface 195 and the like. The sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, an ambient light sensor 180L, bone conduction sensor 180M, etc.
可以理解的是,本申请实施例示意的结构并不构成对电子设备100的具体限定。在本申请另一些实施例中,电子设备100可以包括比图示更多或更少的部件,或者组 合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。It can be understood that, the structure illustrated in the embodiment of the present application does not constitute a specific limitation on the electronic device 100 . In other embodiments of the present application, the electronic device 100 may include more or fewer components than shown, or combine some components, or separate some components, or arrange different components. The illustrated components can be realized in hardware, software or a combination of software and hardware.
处理器110可以包括一个或多个处理单元,例如:处理器110可以包括应用处理器(application processor,AP),调制解调处理器,图形处理器(graphics processing unit,GPU),图像信号处理器(image signal processor,ISP),控制器,视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器,和/或神经网络处理器(neural-network processing unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。处理器可以根据指令操作码和时序信号,产生操作控制信号,完成取指令和执行指令的控制。The processor 110 may include one or more processing units, for example: the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processing unit (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), controller, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural network processor (neural-network processing unit, NPU), etc. Wherein, different processing units may be independent devices, or may be integrated in one or more processors. The processor can generate an operation control signal according to the instruction opcode and the timing signal, and complete the control of fetching and executing the instruction.
处理器110中还可以设置存储器,用于存储指令和数据。在一些实施例中,处理器110中的存储器可以为高速缓冲存储器。该存储器可以保存处理器110用过或使用频率较高的指令或数据。如果处理器110需要使用该指令或数据,可从该存储器中直接调用。避免了重复存取,减少了处理器110的等待时间,因而提高了系统的效率。A memory may also be provided in the processor 110 for storing instructions and data. In some embodiments, the memory in processor 110 may be a cache memory. The memory may store instructions or data used by the processor 110 or used frequently. If the processor 110 needs to use the instruction or data, it can be called directly from the memory. Repeated access is avoided, and the waiting time of the processor 110 is reduced, thus improving the efficiency of the system.
在一些实施例中,处理器110可以包括一个或多个接口。接口可以包括集成电路(inter-integrated circuit,I2C)接口,集成电路内置音频(inter-integrated circuit sound,I2S)接口,脉冲编码调制(pulse code modulation,PCM)接口,通用异步收发传输器(universal asynchronous receiver/transmitter,UART)接口,移动产业处理器接口(mobile industry processor interface,MIPI),通用输入输出(general-purpose input/output,GPIO)接口,用户标识模块(subscriber identity module,SIM)接口,和/或通用串行总线(universal serial bus,USB)接口等。处理器110可以通过以上至少一种接口连接触摸传感器、音频模块、无线通信模块、显示器、摄像头等模块。In some embodiments, processor 110 may include one or more interfaces. The interface may include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, a universal asynchronous transmitter (universal asynchronous receiver/transmitter, UART) interface, mobile industry processor interface (mobile industry processor interface, MIPI), general-purpose input and output (general-purpose input/output, GPIO) interface, subscriber identity module (subscriber identity module, SIM) interface, and /or universal serial bus (universal serial bus, USB) interface, etc. The processor 110 may be connected to modules such as a touch sensor, an audio module, a wireless communication module, a display, and a camera through at least one of the above interfaces.
可以理解的是,本申请实施例示意的各模块间的接口连接关系,只是示意性说明,并不构成对电子设备100的结构限定。在本申请另一些实施例中,电子设备100也可以采用上述实施例中不同的接口连接方式,或多种接口连接方式的组合。It can be understood that the interface connection relationship between the modules shown in the embodiment of the present application is only a schematic illustration, and does not constitute a structural limitation of the electronic device 100 . In other embodiments of the present application, the electronic device 100 may also adopt different interface connection manners in the foregoing embodiments, or a combination of multiple interface connection manners.
电子设备100可以通过GPU,显示屏194,以及应用处理器等实现显示功能。GPU为图像处理的微处理器,连接显示屏194和应用处理器。GPU用于执行数学和几何计算,用于图形渲染。处理器110可包括一个或多个GPU,其执行程序指令以生成或改变显示信息。The electronic device 100 may implement a display function through a GPU, a display screen 194, an application processor, and the like. The GPU is a microprocessor for image processing, and is connected to the display screen 194 and the application processor. GPUs are used to perform mathematical and geometric calculations for graphics rendering. Processor 110 may include one or more GPUs that execute program instructions to generate or change display information.
显示屏194用于显示图像,视频等。显示屏194包括显示面板。显示面板可以采用液晶显示屏(liquid crystal display,LCD),有机发光二极管(organic light-emitting diode,OLED),有源矩阵有机发光二极体或主动矩阵有机发光二极体(active-matrix organic light emitting diode的,AMOLED),柔性发光二极管(flex light-emitting diode,FLED),Miniled,MicroLed,Micro-oLed,量子点发光二极管(quantum dot light emitting diodes,QLED)等。在一些实施例中,电子设备100可以包括1个或多个显示屏194。The display screen 194 is used to display images, videos and the like. The display screen 194 includes a display panel. The display panel can be a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active matrix organic light emitting diode or an active matrix organic light emitting diode (active-matrix organic light emitting diode, AMOLED), flexible light-emitting diode (flex light-emitting diode, FLED), Miniled, MicroLed, Micro-oLed, quantum dot light emitting diodes (quantum dot light emitting diodes, QLED), etc. In some embodiments, the electronic device 100 may include one or more display screens 194 .
电子设备100可以通过摄像头193,ISP,视频编解码器,GPU,显示屏194以及应用处理器AP、神经网络处理器NPU等实现拍照、摄像等功能,即实现图像、视频采集等相关功能。The electronic device 100 can use the camera 193, ISP, video codec, GPU, display screen 194, application processor AP, neural network processor NPU, etc. to realize functions such as photographing and video recording, that is, related functions such as image and video collection.
摄像头193可用于采集拍摄对象的彩色图像数据。在一些实施例中,摄像头193 还可用于采集拍摄对象的深度数据。也就是说,电子设备100中的摄像头可以为不采集深度数据的普通摄像头,例如单目摄像头,也可以为能够采集深度数据的专业摄像头,例如双目摄像头、深度摄像头等。本申请对摄像头193的具体类型不作限制。The camera 193 can be used to collect color image data of the subject. In some embodiments, the camera 193 can also be used to collect depth data of the subject. That is to say, the camera in the electronic device 100 may be a common camera that does not collect depth data, such as a monocular camera, or a professional camera capable of collecting depth data, such as a binocular camera or a depth camera. The present application does not limit the specific type of the camera 193 .
ISP可用于处理摄像头193采集的彩色图像数据。例如,拍照时,打开快门,光线通过镜头被传递到摄像头感光元件上,光信号转换为电信号,摄像头感光元件将该电信号传递给ISP处理,转化为肉眼可见的图像。ISP还可以对图像的噪点,亮度,肤色进行算法优化。ISP还可以对拍摄场景的曝光,色温等参数优化。The ISP can be used to process the color image data collected by the camera 193 . For example, when taking a picture, open the shutter, the light is transmitted to the photosensitive element of the camera through the lens, and the light signal is converted into an electrical signal, and the photosensitive element of the camera transmits the electrical signal to the ISP for processing, and converts it into an image visible to the naked eye. ISP can also perform algorithm optimization on image noise, brightness, and skin color. ISP can also optimize the exposure, color temperature and other parameters of the shooting scene.
在一些实施例中,电子设备100可以包括1个或多个摄像头193。具体的,电子设备100可以包括1个前置摄像头以及至少1个后置摄像头。其中,前置摄像头通常可用于采集面对显示屏194的拍摄者自己的彩色图像数据,后置摄像头可用于采集拍摄者所面对的拍摄对象(如人物、风景等)的彩色图像数据。In some embodiments, the electronic device 100 may include one or more cameras 193 . Specifically, the electronic device 100 may include one front camera and at least one rear camera. Wherein, the front camera can usually be used to collect the color image data of the photographer facing the display screen 194, and the rear camera can be used to collect the color image data of the object (such as people, scenery, etc.) facing the photographer.
在一些实施例中,处理器110中的CPU或GPU或NPU可以对摄像头193所采集的多个视频帧进行处理。具体的,处理器110可对电子设备100的图像采集部件(即摄像头193)采集的多个视频帧进行检测,确定多个视频帧中的地面信息;同时对多个视频帧进行人脸检测,确定人脸区域,并根据人脸区域及预设的人脸三维模型,确定多个视频帧中目标对象的第一人脸位姿;然后根据地面信息、第一人脸位姿及与电子设备的设备位姿,确定目标对象的第一身高。在一些实施例中,还可在电子设备100的显示屏194中显示目标对象的第一身高。In some embodiments, the CPU, GPU, or NPU in the processor 110 may process multiple video frames collected by the camera 193 . Specifically, the processor 110 can detect a plurality of video frames collected by the image acquisition part (i.e., the camera 193) of the electronic device 100, and determine the ground information in the plurality of video frames; perform face detection on the plurality of video frames at the same time, Determine the face area, and determine the first face pose of the target object in multiple video frames according to the face area and the preset 3D model of the face; then according to the ground information, the first face pose and the electronic device The device pose of , determine the first height of the target object. In some embodiments, the first height of the target object may also be displayed on the display screen 194 of the electronic device 100 .
电子设备100中的陀螺仪传感器180B可以用于确定电子设备100的运动姿态。在一些实施例中,可以通过陀螺仪传感器180B确定电子设备100围绕三个轴(即x,y和z轴)的角速度。陀螺仪传感器180B可以用于拍摄防抖。示例性的,当按下快门,陀螺仪传感器180B检测电子设备100抖动的角度,根据角度计算出镜头模组需要补偿的距离,控制镜头反向运动抵消电子设备100的抖动,实现防抖。陀螺仪传感器180B还可以用于导航,体感游戏等场景。The gyro sensor 180B in the electronic device 100 can be used to determine the motion posture of the electronic device 100 . In some embodiments, the angular velocity of the electronic device 100 around three axes (ie, x, y and z axes) can be determined by the gyro sensor 180B. The gyro sensor 180B can be used for image stabilization. Exemplarily, when the shutter is pressed, the gyro sensor 180B detects the shake angle of the electronic device 100, calculates the distance that the lens module needs to compensate according to the angle, and controls the reverse movement of the lens to offset the shake of the electronic device 100 to achieve anti-shake. The gyroscope sensor 180B can also be used in scenarios such as navigation and somatosensory games.
电子设备100中的加速度传感器180E可检测电子设备100在各个方向上(一般为x,y和z三轴)加速度的大小。当电子设备100静止时可检测出重力的大小及方向。还可以用于识别电子设备姿态,应用于横竖屏切换,计步器等应用。The acceleration sensor 180E in the electronic device 100 can detect the acceleration of the electronic device 100 in various directions (generally three axes x, y and z). When the electronic device 100 is stationary, the magnitude and direction of gravity can be detected. It can also be used to identify the posture of electronic devices, and can be used in applications such as horizontal and vertical screen switching, pedometers, etc.
在一些实施例中,电子设备100的陀螺仪传感器180B、加速度传感器180E等部件,可构成惯性测量单元(inertial measurement unit,IMU),用于测量电子设备100的设备位姿。In some embodiments, components such as the gyro sensor 180B and the acceleration sensor 180E of the electronic device 100 may constitute an inertial measurement unit (IMU) for measuring the device pose of the electronic device 100 .
电子设备100中的触摸传感器180K,也称“触控器件”。触摸传感器180K可以设置于显示屏194,由触摸传感器180K与显示屏194组成触摸屏,也称“触控屏”。触摸传感器180K用于检测作用于其上或附近的触摸操作。触摸传感器可以将检测到的触摸操作传递给应用处理器,以确定触摸事件类型。可以通过显示屏194提供与触摸操作相关的视觉输出。在另一些实施例中,触摸传感器180K也可以设置于电子设备100的表面,与显示屏194所处的位置不同。The touch sensor 180K in the electronic device 100 is also called a "touch device". The touch sensor 180K can be disposed on the display screen 194, and the touch sensor 180K and the display screen 194 form a touch screen, also called a “touch screen”. The touch sensor 180K is used to detect a touch operation on or near it. The touch sensor can pass the detected touch operation to the application processor to determine the type of touch event. Visual output related to the touch operation can be provided through the display screen 194 . In other embodiments, the touch sensor 180K may also be disposed on the surface of the electronic device 100 , which is different from the position of the display screen 194 .
电子设备100中的按键190可以包括开机键,音量键等。按键190可以是机械按键,也可以是触摸式按键。电子设备100可以接收按键输入,产生与电子设备100的用户设置以及功能控制有关的键信号输入。例如,通过电子设备100的相机应用 (application,APP)进行拍照或摄像时,相机APP可提供拍照或摄像开始、摄像结束等按键,以便用户进行操作。The keys 190 in the electronic device 100 may include a power key, a volume key and the like. The key 190 can be a mechanical key or a touch key. The electronic device 100 can receive key input and generate key signal input related to user settings and function control of the electronic device 100 . For example, when taking a picture or video through a camera application (application, APP) of the electronic device 100, the camera APP may provide buttons such as start and end of the picture or video, so that the user can operate.
电子设备100中的马达191可以产生振动提示。马达191可以用于来电振动提示,也可以用于触摸振动反馈。例如,作用于不同应用(例如拍照,音频播放等)的触摸操作,可以对应不同的振动反馈效果。作用于显示屏194不同区域的触摸操作,马达191也可对应不同的振动反馈效果。不同的应用场景(例如:时间提醒,接收信息,闹钟,游戏,拍照,摄像等)也可以对应不同的振动反馈效果。触摸振动反馈效果还可以支持自定义。The motor 191 in the electronic device 100 can generate a vibration alert. The motor 191 can be used for incoming call vibration prompts, and can also be used for touch vibration feedback. For example, touch operations applied to different applications (such as taking pictures, playing audio, etc.) may correspond to different vibration feedback effects. The motor 191 may also correspond to different vibration feedback effects for touch operations acting on different areas of the display screen 194 . Different application scenarios (for example: time reminder, receiving information, alarm clock, games, taking pictures, video recording, etc.) can also correspond to different vibration feedback effects. The touch vibration feedback effect can also support customization.
电子设备100的软件系统可以采用分层架构,事件驱动架构,微核架构,微服务架构,或云架构。本申请实施例以分层架构的Android系统为例,示例性说明电子设备100的软件结构。The software system of the electronic device 100 may adopt a layered architecture, an event-driven architecture, a micro-kernel architecture, a micro-service architecture, or a cloud architecture. The embodiment of the present application takes the Android system with a layered architecture as an example to illustrate the software structure of the electronic device 100 .
图2示出根据本申请一实施例的电子设备100的软件结构框图。FIG. 2 shows a block diagram of the software structure of the electronic device 100 according to an embodiment of the present application.
分层架构将软件分成若干个层,每一层都有清晰的角色和分工。层与层之间通过软件接口通信。在一些实施例中,将Android系统分为五层,从上至下分别为应用程序层,应用程序框架层,安卓运行时(Android runtime,ART)和原生C/C++库,硬件抽象层(Hardware Abstract Layer,HAL)以及内核层。The layered architecture divides the software into several layers, and each layer has a clear role and division of labor. Layers communicate through software interfaces. In some embodiments, the Android system is divided into five layers, which are application program layer, application program framework layer, Android runtime (Android runtime, ART) and native C/C++ library, hardware abstraction layer (Hardware Abstract Layer, HAL) and the kernel layer.
应用程序层可以包括一系列应用程序包。The application layer can consist of a series of application packages.
如图2所示,应用程序包可以包括相机,图库,日历,通话,地图,导航,WLAN,蓝牙,音乐,视频,短信息等应用程序。As shown in Figure 2, the application package may include applications such as camera, gallery, calendar, call, map, navigation, WLAN, Bluetooth, music, video, and short message.
应用程序框架层为应用程序层的应用程序提供应用编程接口(application programming interface,API)和编程框架。应用程序框架层包括一些预先定义的函数。The application framework layer provides an application programming interface (application programming interface, API) and a programming framework for applications in the application layer. The application framework layer includes some predefined functions.
如图2所示,应用程序框架层可以包括窗口管理器,内容提供器,视图系统,资源管理器,通知管理器,活动管理器,输入管理器等。As shown in Figure 2, the application framework layer may include window managers, content providers, view systems, resource managers, notification managers, activity managers, input managers, and so on.
窗口管理器提供窗口管理服务(Window Manager Service,WMS),WMS可以用于窗口管理、窗口动画管理、surface管理以及作为输入系统的中转站。The window manager provides window management service (Window Manager Service, WMS). WMS can be used for window management, window animation management, surface management and as a transfer station for input systems.
内容提供器用来存放和获取数据,并使这些数据可以被应用程序访问。该数据可以包括视频,图像,音频,拨打和接听的电话,浏览历史和书签,电话簿等。Content providers are used to store and retrieve data and make it accessible to applications. This data can include videos, images, audio, calls made and received, browsing history and bookmarks, phonebook, etc.
视图系统包括可视控件,例如显示文字的控件,显示图片的控件等。视图系统可用于构建应用程序。显示界面可以由一个或多个视图组成的。例如,包括短信通知图标的显示界面,可以包括显示文字的视图以及显示图片的视图。The view system includes visual controls, such as controls for displaying text, controls for displaying pictures, and so on. The view system can be used to build applications. A display interface can consist of one or more views. For example, a display interface including a text message notification icon may include a view for displaying text and a view for displaying pictures.
资源管理器为应用程序提供各种资源,比如本地化字符串,图标,图片,布局文件,视频文件等等。The resource manager provides various resources for the application, such as localized strings, icons, pictures, layout files, video files, and so on.
通知管理器使应用程序可以在状态栏中显示通知信息,可以用于传达告知类型的消息,可以短暂停留后自动消失,无需用户交互。比如通知管理器被用于告知下载完成,消息提醒等。通知管理器还可以是以图表或者滚动条文本形式出现在系统顶部状态栏的通知,例如后台运行的应用程序的通知,还可以是以对话窗口形式出现在屏幕上的通知。例如在状态栏提示文本信息,发出提示音,电子设备振动,指示灯闪烁等。The notification manager enables the application to display notification information in the status bar, which can be used to convey notification-type messages, and can automatically disappear after a short stay without user interaction. For example, the notification manager is used to notify the download completion, message reminder, etc. The notification manager can also be a notification that appears on the top status bar of the system in the form of a chart or scroll bar text, such as a notification of an application running in the background, or a notification that appears on the screen in the form of a dialog window. For example, prompting text information in the status bar, issuing a prompt sound, vibrating the electronic device, and flashing the indicator light, etc.
活动管理器可以提供活动管理服务(Activity Manager Service,AMS),AMS可以用于系统组件(例如活动、服务、内容提供者、广播接收器)的启动、切换、调度以及应用进程的管理和调度工作。The activity manager can provide activity management service (Activity Manager Service, AMS), AMS can be used for system components (such as activities, services, content providers, broadcast receivers) to start, switch, schedule, and manage and schedule application processes .
输入管理器可以提供输入管理服务(Input Manager Service,IMS),IMS可以用于管理系统的输入,例如触摸屏输入、按键输入、传感器输入等。IMS从输入设备节点取出事件,通过和WMS的交互,将事件分配至合适的窗口。The input manager can provide input management service (Input Manager Service, IMS), and IMS can be used to manage the input of the system, such as touch screen input, key input, sensor input, etc. IMS fetches events from input device nodes, and distributes events to appropriate windows through interaction with WMS.
安卓运行时包括核心库和安卓运行时。安卓运行时负责将源代码转换为机器码。安卓运行时主要包括采用提前(ahead or time,AOT)编译技术和及时(just in time,JIT)编译技术。The Android runtime includes the core library and the Android runtime. The Android runtime is responsible for converting source code into machine code. The Android runtime mainly includes the use of ahead of time (ahead or time, AOT) compilation technology and just in time (just in time, JIT) compilation technology.
核心库主要用于提供基本的Java类库的功能,例如基础数据结构、数学、IO、工具、数据库、网络等库。核心库为用户进行安卓应用开发提供了API。The core library is mainly used to provide basic Java class library functions, such as basic data structure, mathematics, IO, tools, database, network and other libraries. The core library provides APIs for users to develop Android applications.
原生C/C++库可以包括多个功能模块。例如:表面管理器(surface manager),媒体框架(Media Framework),libc,OpenGL ES、SQLite、Webkit等。其中,表面管理器用于对显示子系统进行管理,并且为多个应用程序提供了2D和3D图层的融合。媒体框架支持多种常用的音频,视频格式回放和录制,以及静态图像文件等。媒体库可以支持多种音视频编码格式,例如:MPEG4,H.264,MP3,AAC,AMR,JPG,PNG等。OpenGL ES提供应用程序中2D图形和3D图形的绘制和操作。SQLite为电子设备100的应用程序提供轻量级关系型数据库。A native C/C++ library can include multiple functional modules. For example: surface manager (surface manager), media framework (Media Framework), libc, OpenGL ES, SQLite, Webkit, etc. Among them, the surface manager is used to manage the display subsystem, and provides the fusion of 2D and 3D layers for multiple applications. The media framework supports playback and recording of various commonly used audio and video formats, as well as still image files. The media library can support a variety of audio and video encoding formats, such as: MPEG4, H.264, MP3, AAC, AMR, JPG, PNG, etc. OpenGL ES provides the drawing and manipulation of 2D graphics and 3D graphics in applications. SQLite provides a lightweight relational database for applications of the electronic device 100 .
硬件抽象层运行于用户空间(user space),对内核层驱动进行封装,向上层提供调用接口。The hardware abstraction layer runs in user space, encapsulates the kernel layer driver, and provides a call interface to the upper layer.
内核层是硬件和软件之间的层。内核层至少包含显示驱动,摄像头驱动,音频驱动,传感器驱动。The kernel layer is the layer between hardware and software. The kernel layer includes at least a display driver, a camera driver, an audio driver, and a sensor driver.
下面结合本申请实施例的身高检测场景,示例性说明电子设备100软件以及硬件的工作流程。The workflow of the software and hardware of the electronic device 100 will be exemplarily described below in conjunction with the height detection scenario of the embodiment of the present application.
假设身高检测通过电子设备上的应用程序身高APP来实现,进行身高检测时,用户可触摸电子设备显示屏上的身高APP图标,当触摸传感器180K接收到触摸操作,相应的硬件中断被发给内核层。内核层将触摸操作加工成原始输入事件(包括触摸坐标,触摸操作的时间戳等信息)。原始输入事件被存储在内核层。应用程序框架层从内核层获取原始输入事件,识别该输入事件所对应的控件。以该触摸操作是触摸单击操作,该单击操作所对应的控件为身高APP图标的控件为例,身高APP调用应用框架层的接口,启动身高APP,进而通过调用内核层启动摄像头驱动,通过摄像头193采集多个视频帧,即通过摄像头193采集视频流。其中,多个视频帧中可包括待检测身高的目标对象。Assuming that the height detection is realized through the application height APP on the electronic device, the user can touch the height APP icon on the display screen of the electronic device when the height detection is performed, and when the touch sensor 180K receives the touch operation, the corresponding hardware interrupt is sent to the kernel layer. The kernel layer processes touch operations into original input events (including touch coordinates, time stamps of touch operations, and other information). Raw input events are stored at the kernel level. The application framework layer obtains the original input event from the kernel layer, and identifies the control corresponding to the input event. Take the touch operation as a touch-click operation, and the control corresponding to the click operation is the control of the Height APP icon as an example. The Height APP calls the interface of the application framework layer to start the Height APP, and then starts the camera driver by calling the kernel layer. The camera 193 collects multiple video frames, that is, collects video streams through the camera 193 . Wherein, multiple video frames may include the target object whose height is to be detected.
采集到多个视频帧后,电子设备100可通过处理器110对多个视频帧进行地面检测、人脸检测等相关处理,从而确定出目标对象的身高。After collecting a plurality of video frames, the electronic device 100 may perform related processing such as ground detection and face detection on the plurality of video frames through the processor 110, so as to determine the height of the target object.
图3示出根据本申请一实施例的身高检测方法的流程图。如图3所示,该身高检测方法包括:步骤S310,对电子设备的图像采集部件采集的多个视频帧进行语义平面检测,确定所述多个视频帧中的地面信息。Fig. 3 shows a flowchart of a height detection method according to an embodiment of the present application. As shown in FIG. 3 , the height detection method includes: step S310 , performing semantic plane detection on multiple video frames collected by the image acquisition component of the electronic device, and determining ground information in the multiple video frames.
其中,图像采集部件可以是电子设备的摄像头,该摄像头可以是不采集深度数据 的普通摄像头,例如单目摄像头,也可以是能够采集深度数据的专业摄像头,例如双目摄像头、深度摄像头等。Wherein, the image acquisition component may be a camera of an electronic device, and the camera may be an ordinary camera that does not collect depth data, such as a monocular camera, or a professional camera capable of collecting depth data, such as a binocular camera, a depth camera, etc.
在图像采集部件为普通摄像头的情况下,图像采集部件采集的多个视频帧为彩色(red green blue,RGB)视频帧,由于多个视频帧可以形成视频流,也可以认为,图像采集部件采集的是RGB视频流。在图像采集部件为专业摄像头的情况下,图像采集部件采集的多个视频帧中除了包括RGB图像数据外,还可以包括深度数据。需要说明的是,本申请对图像采集部件的具体类型不作限制。In the case that the image acquisition component is an ordinary camera, the multiple video frames collected by the image capture component are color (red green blue, RGB) video frames. Since multiple video frames can form a video stream, it can also be considered that the image capture component captures is an RGB video stream. In the case where the image acquisition component is a professional camera, the multiple video frames captured by the image acquisition component may include depth data in addition to RGB image data. It should be noted that, the present application does not limit the specific type of the image acquisition component.
可通过电子设备的图像采集部件采集多个视频帧(即RGB视频流),并对采集的多个视频帧进行平面检测、语义分割等处理,确定多个视频帧中的地面信息。其中,地面信息可通过空间中的平面方程来表示,也可通过其他方式来表示,本申请对此不作限制。Multiple video frames (i.e., RGB video streams) can be collected by the image acquisition part of the electronic device, and plane detection, semantic segmentation and other processing can be performed on the multiple collected video frames to determine the ground information in the multiple video frames. Wherein, the ground information may be expressed by a plane equation in space, or by other means, which is not limited in this application.
在一种可能的实现方式中,确定多个视频帧中的地面信息时,可首先对图像采集部件采集的多个视频帧进行平面检测,确定多个视频帧中的多个平面的位置信息。可选的,可通过SLAM技术确定多个视频帧中的多个平面的位置信息。In a possible implementation manner, when determining ground information in multiple video frames, plane detection may first be performed on the multiple video frames collected by the image acquisition component, and position information of multiple planes in the multiple video frames may be determined. Optionally, position information of multiple planes in multiple video frames may be determined by using a SLAM technology.
例如,可从图像采集部件采集的多个视频帧中,提取三维信息,得到稀疏点云数据,同时确定电子设备采集各个视频帧时的设备位姿;然后根据电子设备采集各个视频帧时的设备位姿,通过平面拟合算法,对稀疏点云数据进行平面拟合,得到多个视频帧中的多个平面的位置信息。其中,各个平面的位置信息均可通过空间中的平面方程进行表示。For example, three-dimensional information can be extracted from multiple video frames collected by the image acquisition component to obtain sparse point cloud data, and at the same time determine the device pose when the electronic device collects each video frame; and then according to the device when the electronic device collects each video frame Pose, through the plane fitting algorithm, performs plane fitting on the sparse point cloud data, and obtains the position information of multiple planes in multiple video frames. Wherein, the position information of each plane can be expressed by the plane equation in space.
通过提取多个视频帧中的三维信息,得到稀疏点云数据,并根据电子设备采集各个视频帧时的设备位姿,对稀疏点云数据进行平面拟合,得到多个视频帧中的多个平面的位置信息,不仅能够提高处理效率,还能够提高各个平面的位置信息的准确性。Sparse point cloud data is obtained by extracting the three-dimensional information in multiple video frames, and according to the device pose when the electronic device collects each video frame, plane fitting is performed on the sparse point cloud data to obtain multiple The position information of the plane can not only improve the processing efficiency, but also can improve the accuracy of the position information of each plane.
在确定多个视频帧中的多个平面的位置信息的同时,还可对图像采集部件采集的多个视频帧进行语义分割,得到各个视频帧的语义分割结果。具体的,对于多个视频帧中的任一视频帧,可对该视频帧进行语义识别,识别出该视频帧中物体的类别,例如地面、桌子、墙等,并根据识别出的物体的类别,对该视频帧中的各个像素进行标记,得到该视频帧的语义分割结果。While determining the position information of multiple planes in the multiple video frames, semantic segmentation can also be performed on the multiple video frames collected by the image acquisition component to obtain the semantic segmentation results of each video frame. Specifically, for any video frame among multiple video frames, semantic recognition can be performed on the video frame to identify the category of the object in the video frame, such as ground, table, wall, etc., and according to the category of the identified object , mark each pixel in the video frame, and obtain the semantic segmentation result of the video frame.
然后可根据多个视频帧中的多个平面的位置信息及各个视频帧的语义分割结果,对多个视频帧中的多个平面进行语义识别,得到多个语义平面信息,例如桌面、墙面、地面等平面信息,之后从多个语义平面信息中选取出地面信息即可。Then, according to the position information of multiple planes in multiple video frames and the semantic segmentation results of each video frame, semantic recognition can be performed on multiple planes in multiple video frames to obtain multiple semantic plane information, such as desktops and walls. , ground and other plane information, and then select the ground information from multiple semantic plane information.
图4示出根据本申请一实施例的地面信息的检测过程的示意图。如图4所示,假设本申请实施例的身高检测方法通过电子设备(例如手机)上的应用程序身高APP实现,用户打开身高APP后,身高APP可通过电子设备的图像采集部件(例如摄像头)进行图像采集,得到多个视频帧410(即RGB视频流);然后对多个视频帧410进行三维信息提取,得到稀疏点云数据420,同时确定电子设备采集多个视频帧410时的设备位姿430;并根据电子设备采集多个视频帧410时的设备位姿430,通过平面拟合算法,对稀疏点云数据420进行平面拟合,得到多个视频帧410中的多个平面的位置信息440。Fig. 4 shows a schematic diagram of a detection process of ground information according to an embodiment of the present application. As shown in Figure 4, it is assumed that the height detection method of the embodiment of the present application is realized by the application program height APP on the electronic device (such as a mobile phone). Perform image acquisition to obtain multiple video frames 410 (i.e. RGB video streams); then perform three-dimensional information extraction on multiple video frames 410 to obtain sparse point cloud data 420, and determine the device position when the electronic device collects multiple video frames 410 pose 430; and according to the device pose 430 when the electronic device collects a plurality of video frames 410, through a plane fitting algorithm, the sparse point cloud data 420 is subjected to plane fitting to obtain the positions of multiple planes in the multiple video frames 410 Information 440.
在确定多个视频帧410中的多个平面的位置信息440的同时,还可对多个视频帧 410进行语义分割450,得到语义分割结果460;然后可根据多个平面的位置信息440及语义分割结果460,对多个视频帧410中的多个平面进行语义识别,得到多个语义平面信息470,并从多个语义平面信息470中,选取出地面信息480,其中,地面信息480可通过空间中的平面方程来表示。While determining the position information 440 of multiple planes in multiple video frames 410, semantic segmentation 450 can also be performed on multiple video frames 410 to obtain a semantic segmentation result 460; then, according to the position information 440 of multiple planes and the semantic Segmentation result 460, perform semantic recognition on multiple planes in multiple video frames 410, obtain multiple semantic plane information 470, and select ground information 480 from multiple semantic plane information 470, wherein, ground information 480 can be passed represented by plane equations in space.
以上仅以图像采集部件采集的多个视频帧(即RGB视频流)作为输入,对地面信息的检测过程进行了示例性地说明。在一些实施例中,还可将图像采集部件采集的深度数据、惯性测量单元IMU采集的电子设备的设备位姿等信息同时作为输入,以提高地面检测的准确性。The detection process of the ground information is exemplarily described above only by taking multiple video frames (ie, RGB video streams) collected by the image collection component as input. In some embodiments, information such as the depth data collected by the image collection component and the device pose of the electronic device collected by the inertial measurement unit IMU can also be used as input at the same time, so as to improve the accuracy of ground detection.
通过对图像采集部件采集的多个视频帧的平面检测及语义分割,使得电子设备能够自动感知拍摄的场景并获取多个语义平面信息,进而能够自动识别地面信息,不仅能够避免用户手动操作,例如用户手动点选地面等手动操作,而且能够提高地面检测的准确性。Through the plane detection and semantic segmentation of multiple video frames collected by the image acquisition component, the electronic device can automatically perceive the captured scene and obtain multiple semantic plane information, and then automatically recognize the ground information, which can not only avoid manual operations by users, such as Manual operations such as manual selection of the ground by the user can improve the accuracy of ground detection.
在一种可能的实现方式中,在预设时段(例如5s、10s等)内未检测到多个视频帧中的地面信息的情况下,可提示用户拍摄地面。例如,可通过语音播报的方式,向用户播报“请拍摄地面”、“未检测到地面”等提示信息,也可在身高APP的显示界面通过文本、动画等的方式,向用户展示“请拍摄地面”、“未检测到地面”等提示信息,以便用户及时调整拍摄内容,从而提高地面检测的效率。In a possible implementation manner, if the ground information in multiple video frames is not detected within a preset period of time (for example, 5s, 10s, etc.), the user may be prompted to take pictures of the ground. For example, the prompt information such as "please shoot the ground" and "the ground is not detected" can be broadcast to the user through voice broadcast, or the display interface of the height APP can be used to show the user "please shoot the ground" through text, animation, etc. ground", "ground not detected" and other prompt information, so that users can adjust the shooting content in time, thereby improving the efficiency of ground detection.
需要说明的是,本领域技术人员可根据实际情况设置未检测到多个视频帧中的地面信息时的提示信息的内容及提示方式,本申请对此均不作限制。It should be noted that those skilled in the art can set the content and prompting method of the prompt information when ground information in multiple video frames is not detected according to the actual situation, which is not limited in this application.
步骤S320,对所述多个视频帧进行人脸检测,确定人脸区域。在确定多个视频帧中的地面信息的同时,可通过特征提取、关键点检测等方式,对多个视频帧进行人脸检测。在检测到完整人脸的情况下,可从多个视频帧中确定出人脸区域,并将与该人脸区域对应的对象确定为目标对象。其中,人脸区域可以为一个或多个,目标对象也可以为一个或多个,本申请对此均不作限制。Step S320, performing face detection on the plurality of video frames to determine the face area. While determining ground information in multiple video frames, face detection can be performed on multiple video frames by means of feature extraction, key point detection, and the like. When a complete human face is detected, a human face area can be determined from multiple video frames, and an object corresponding to the human face area can be determined as a target object. Wherein, there may be one or more face regions, and one or more target objects, which are not limited in this application.
在一种可能的实现方式中,从多个视频帧中确定出人脸区域后,可判断人脸区域是否满足第三预设条件,其中,第三预设条件为人脸区域位于其所在的视频帧的预设区域内,预设区域可以是人脸区域所在的视频帧的中心区域,例如,可将人脸区域所在的视频帧中,以其中心点为中心的、面积为其二分之一的区域设置为人脸区域所在的视频帧的中心区域。需要说明的是,本来领域技术人员可根据实际情况对人脸区域所在的视频帧的预设区域进行设置,本申请对此不作限制。In a possible implementation, after the face area is determined from a plurality of video frames, it can be judged whether the face area satisfies a third preset condition, wherein the third preset condition is that the face area is located in the video frame where it is located. In the preset area of the frame, the preset area can be the central area of the video frame where the face area is located. The area of one is set as the central area of the video frame where the face area is located. It should be noted that those skilled in the art can set the preset area of the video frame where the face area is located according to the actual situation, and this application does not limit this.
在人脸区域不满足第三预设条件的情况下,可通过语音播报、文本显示、动画显示等方式,提示用户调整电子设备的设备位姿,以使人脸区域满足第三预设条件,从而使得人脸区域位于其所在的视频帧的预设区域内,以提高身高检测的准确性。When the face area does not meet the third preset condition, the user may be prompted to adjust the device pose of the electronic device through voice broadcast, text display, animation display, etc., so that the face area meets the third preset condition, Therefore, the face area is located within the preset area of the video frame where it is located, so as to improve the accuracy of height detection.
步骤S330,根据所述人脸区域及预设的人脸三维模型,确定所述多个视频帧中目标对象的第一人脸位姿。在确定多个视频帧中目标对象的第一人脸位姿时,可首先根据人脸区域及预设的人脸三维模型(即平均脸三维模型),通过预训练的神经网络,建立目标对象的人脸三维模型。例如,可将人脸区域及预设的人脸三维模型,输入预训练的卷积神经网络(convolutional neural networks,CNN)中进行配准,得到目标对象的人脸三维模型。Step S330: Determine the first face pose of the target object in the plurality of video frames according to the face area and the preset three-dimensional model of the face. When determining the first face pose of the target object in multiple video frames, the target object can be established through the pre-trained neural network according to the face area and the preset 3D model of the face (ie, the average face 3D model). 3D model of human face. For example, the face area and the preset 3D face model can be input into a pre-trained convolutional neural network (CNN) for registration to obtain a 3D face model of the target object.
然后可根据预设的人脸三维模型的参数,例如,人脸结构上的约束关系:瞳距、鼻尖至头顶的距离等,确定目标对象的人脸三维模型相对于电子设备的图像采集部件的位置信息及旋转信息,并将目标对象的人脸三维模型相对于电子设备的图像采集部件的位置信息及旋转信息,确定为目标对象的第一人脸位姿。其中,旋转信息可通过俯仰角、滚转角及偏航角来表示。Then according to the parameters of the preset three-dimensional model of the human face, for example, the constraints on the structure of the human face: pupillary distance, distance from the tip of the nose to the top of the head, etc., determine the relative distance between the three-dimensional human face model of the target object and the image acquisition component of the electronic device. position information and rotation information, and determine the position information and rotation information of the three-dimensional face model of the target object relative to the image acquisition component of the electronic device as the first face pose of the target object. Wherein, the rotation information may be represented by pitch angle, roll angle and yaw angle.
图5示出根据本申请一实施例的目标对象的第一人脸位姿的确定过程的示意图。如图5所示,可对电子设备的图像采集部件采集的多个视频帧510进行人脸检测520,确定人脸区域530;将人脸区域530及预设的人脸三维模型540输入预训练的卷积神经网络CNN 550进行配准,得到目标对象的人脸三维模型560;根据预设的人脸三维模型540的参数,确定目标对象的人脸三维模型相对于电子设备的图像采集部件的位置信息及旋转信息,并将该位置信息及旋转信息确定为目标对象的第一人脸位姿570。Fig. 5 shows a schematic diagram of a process of determining a first face pose of a target object according to an embodiment of the present application. As shown in Figure 5, face detection 520 can be performed on a plurality of video frames 510 collected by the image acquisition part of the electronic device to determine the face area 530; the face area 530 and the preset three-dimensional model 540 of the face are input into the pre-training The convolutional neural network CNN 550 of the target object is registered to obtain the three-dimensional face model 560 of the target object; according to the parameters of the three-dimensional face model 540 preset, determine the three-dimensional face model of the target object relative to the image acquisition part of the electronic device position information and rotation information, and determine the position information and rotation information as the first face pose 570 of the target object.
通过这种方式,能够根据人脸区域及预设的人脸三维模型,确定目标对象的人脸三维模型,再根据预设的人脸三维模型的参数,确定目标对象的第一人脸位姿,从而能够在确定目标对象的第一人脸位姿使用人脸三维重建技术,不仅能够提高处理效率,而且能够提高目标对象的第一人脸位姿的准确性,进而提高身高检测的准确性。In this way, the 3D face model of the target object can be determined according to the face area and the preset 3D face model, and then the first face pose of the target object can be determined according to the parameters of the preset 3D face model , so that the face 3D reconstruction technology can be used to determine the first face pose of the target object, which can not only improve the processing efficiency, but also improve the accuracy of the first face pose of the target object, thereby improving the accuracy of height detection .
在一种可能的实现方式中,可预先根据多个样本人脸区域及预设的人脸三维模型,对神经网络(用于生成目标对象的人脸三维模型,例如卷积神经网络CNN 550)进行训练。例如,对于任一样本人脸区域,可将该样本人脸区域及预设的人脸三维模型,输入神经网络中进行配准,得到样本人脸三维模型;然后对样本人脸三维模型进行反向渲染(reverse render),即将样本人脸三维模型投影到二维空间,得到反向渲染图像;根据各个反向渲染图像与对应的样本人脸区域之间的差异,确定神经网络的网络损失;根据该网络损失,对神经网络的网络参数进行调整。In a possible implementation, the neural network (used to generate the 3D face model of the target object, such as the convolutional neural network CNN 550) can be pre-configured according to a plurality of sample face regions and a preset 3D model of the face. to train. For example, for any sample face area, the sample face area and the preset three-dimensional model of the face can be input into the neural network for registration to obtain the three-dimensional model of the sample face; then the three-dimensional model of the sample face is reversed Rendering (reverse render), which is to project the 3D model of the sample face into a 2D space to obtain a reverse rendering image; determine the network loss of the neural network according to the difference between each reverse rendering image and the corresponding sample face area; The network loss adjusts the network parameters of the neural network.
在神经网络满足预设的训练结束条件时,结束训练,得到已训练的神经网络。其中,训练结束条件可例如神经网络的训练轮次达到预设的训练轮次阈值、神经网络的网络损失收敛在一定区间内、神经网络在验证集上验证通过等。本领域技术人员可根据实际情况对神经网络的训练结束条件进行设置,本申请对此不作限制。When the neural network meets the preset training end condition, the training ends and the trained neural network is obtained. Wherein, the training end condition may be, for example, that the training rounds of the neural network reach a preset training round threshold, the network loss of the neural network converges within a certain range, and the neural network passes verification on the verification set, etc. Those skilled in the art can set the training end condition of the neural network according to the actual situation, which is not limited in this application.
在一种可能的实现方式中,确定目标对象的第一人脸位姿后,可判断第一人脸位姿是否满足第二预设条件,第二预设条件为第一人脸位姿中的俯仰角位于预设的第二角度区间内、第一人脸位姿中的滚转角位于预设的第三角度区间内及第一人脸位姿中的偏航角位于预设的第四角度区间内。其中,第二角度区间、第三角度区间及第四角度区间可以相同或不同。需要说明的是,本领域技术人员可根据实际情况设置第二角度区间、第三角度区间及第四角度区间的具体取值,本申请对此不作限制。In a possible implementation, after determining the first face pose of the target object, it can be judged whether the first face pose satisfies the second preset condition, and the second preset condition is that in the first face pose The pitch angle of the first face pose is in the preset second angle interval, the roll angle in the first face pose is in the preset third angle interval, and the yaw angle in the first face pose is in the preset fourth angle interval. within the angle range. Wherein, the second angle interval, the third angle interval and the fourth angle interval may be the same or different. It should be noted that those skilled in the art can set the specific values of the second angle interval, the third angle interval and the fourth angle interval according to the actual situation, which is not limited in the present application.
在目标对象的第一人脸位姿不满足第二预设条件的情况下,可通过语音播报、文本显示、动画显示等方式,提示用户调整电子设备的设备位姿和/或改变目标对象的人脸位姿,以使目标对象的第一人脸位姿满足第二预设条件,从而使得目标对象的人脸面向电子设备的图像采集部件,也就是说,能够使得图像采集部件采集的视频帧中的人脸为目标对象的正脸,以提高身高检测的准确性。When the first face pose of the target object does not meet the second preset condition, the user may be prompted to adjust the device pose of the electronic device and/or change the face pose of the target object through voice broadcast, text display, animation display, etc. Face pose, so that the first face pose of the target object satisfies the second preset condition, so that the face of the target object faces the image acquisition component of the electronic device, that is, the video captured by the image acquisition component can The face in the frame is the frontal face of the target object to improve the accuracy of height detection.
步骤S340,根据所述地面信息、所述第一人脸位姿及所述电子设备的设备位姿,确定所述目标对象的第一身高。其中,电子设备的设备位姿为电子设备采集人脸区域 所在的视频帧时的位姿。Step S340: Determine a first height of the target object according to the ground information, the first face pose, and the device pose of the electronic device. Wherein, the device pose of the electronic device is the pose when the electronic device captures the video frame where the face area is located.
在一种可能的实现方式中,确定目标对象的第一身高之前,可判断与电子设备的设备位姿指示的电子设备的俯仰角是否满足第一预设条件,第一预设条件为电子设备的设备位姿中指示的电子设备的俯仰角位于预设的第一角度区间内。In a possible implementation manner, before determining the first height of the target object, it may be determined whether the pitch angle of the electronic device indicated by the device pose of the electronic device satisfies a first preset condition. The first preset condition is that the electronic device The pitch angle of the electronic device indicated in the device pose is within the preset first angle range.
在电子设备的设备位姿指示的电子设备的俯仰角不满足第一预设条件的情况下,可通过语音播报、文本显示、动画显示等方式,提示用户调整电子设备的设备位姿,以避免视频帧采集时的过度仰拍或俯拍,从而提高身高检测的准确性。When the pitch angle of the electronic device indicated by the device pose of the electronic device does not meet the first preset condition, the user may be prompted to adjust the device pose of the electronic device through voice broadcast, text display, animation display, etc., to avoid Excessive upward or downward shooting during video frame acquisition, thereby improving the accuracy of height detection.
在确定目标对象的第一身高时,可首先确定地面信息及目标对象的第一人脸位姿的坐标系。在采用SLAM技术的情况下,地面信息位于世界坐标下。可根据惯性测量单元IMU采集的数据,将世界坐标系的Y轴确定为现实世界的垂直方向,由于世界坐标系下的距离、对象尺寸等物理信息与现实世界是相同的,通过这种方式,可以建立虚拟的世界坐标系与现实世界的联系,从而在世界坐标系下计算得到的物体尺寸即为现实世界中物体的实际尺寸。When determining the first height of the target object, the ground information and the coordinate system of the first face pose of the target object can be determined first. In the case of SLAM technology, ground information is located in world coordinates. According to the data collected by the inertial measurement unit IMU, the Y axis of the world coordinate system can be determined as the vertical direction of the real world. Since the physical information such as distance and object size in the world coordinate system is the same as the real world, in this way, The connection between the virtual world coordinate system and the real world can be established, so that the size of the object calculated in the world coordinate system is the actual size of the object in the real world.
而目标对象的第一人脸位姿为目标对象相对于电子设备的图像采集部件的人脸位姿,其位于相机坐标系下。在相机坐标系下,电子设备的图像采集部件位于坐标原点,从目标对象的人脸三维模型的角度看,电子设备的图像采集部件的位置是固定的,而且由于相机坐标系下的对象与现实世界的对象没有尺寸对比,相机坐标系下的目标对象的人脸尺寸与现实世界中目标对象的人脸实际尺寸存在一定的缩放比例。因此,需要进行人脸尺寸调整及坐标系变换。The first face pose of the target object is the face pose of the target object relative to the image acquisition component of the electronic device, which is located in the camera coordinate system. In the camera coordinate system, the image acquisition part of the electronic device is located at the origin of the coordinates. From the perspective of the 3D face model of the target object, the position of the image acquisition part of the electronic device is fixed. There is no size comparison of objects in the world, and there is a certain scaling ratio between the face size of the target object in the camera coordinate system and the actual face size of the target object in the real world. Therefore, face size adjustment and coordinate system transformation are required.
在一种可能的实现方式中,在确定目标对象的第一身高时,可首先根据预设的瞳距参考值调整目标对象的第一人脸位姿,以得到目标对象的第二人脸位姿,其中,第一人脸位姿指示的人脸尺寸与人脸区域中目标对象的人脸尺寸相同,第二人脸位姿指示的人脸尺寸为目标对象的人脸实际尺寸,第二人脸位姿位于相机坐标系下。也就是说,可在相机坐标系下,对目标对象的第一人脸位姿进行调整,使得调整后的第二人脸位姿指示的人脸尺寸为目标对象的人脸实际尺寸。In a possible implementation, when determining the first height of the target object, the first face pose of the target object can be adjusted according to the preset interpupillary distance reference value to obtain the second face pose of the target object. pose, wherein the face size indicated by the first face pose is the same as the face size of the target object in the face area, the face size indicated by the second face pose is the actual face size of the target object, and the second The face pose is located in the camera coordinate system. That is to say, the first face pose of the target object can be adjusted in the camera coordinate system, so that the face size indicated by the adjusted second face pose is the actual face size of the target object.
例如,可确定目标对象的第一人脸位姿中的瞳距值,并根据预设的瞳距参考值及第一人脸位姿中的瞳距值,确定人脸尺寸变换系数;然后根据人脸尺寸变换系数,对第一人脸位姿进行调整,得到目标对象的第二人脸位姿。For example, the interpupillary distance value in the first human face pose of the target object can be determined, and the face size transformation coefficient can be determined according to the preset interpupillary distance reference value and the interpupillary distance value in the first human face pose; The face size conversion coefficient is used to adjust the first face pose to obtain the second face pose of the target object.
通过瞳距参考值及第一人脸位姿中的瞳距距值,确定人脸尺寸变换系数,并根据人脸尺寸变换系数,对第一人脸位姿进行调整,得到目标对象的第二人脸位姿,从而能够得到相机坐标系目标对象的人脸实际尺寸及位姿。Determine the face size transformation coefficient based on the interpupillary distance reference value and the interpupillary distance value in the first face pose, and adjust the first face pose according to the face size transformation coefficient to obtain the second face size of the target object. Face pose, so that the actual size and pose of the face of the target object in the camera coordinate system can be obtained.
得到目标对象的第二人脸位姿后,可根据电子设备的设备位姿,对第二人脸位姿进行坐标变换,得到目标对象的第三人脸位姿,其中,目标对象的第三人脸位姿位于世界坐标系下。After obtaining the second face pose of the target object, coordinate transformation can be performed on the second face pose according to the device pose of the electronic device to obtain the third face pose of the target object, wherein the third face pose of the target object The face pose is located in the world coordinate system.
在一种可能的实现方式中,可通过下述公式(1)对第二人脸位姿P C进行坐标变换,得到目标对象的第三人脸位姿P wIn a possible implementation, the coordinate transformation of the second face pose P C can be performed by the following formula (1) to obtain the third face pose P w of the target object:
P w=T -1P C    (1) P w = T -1 P C (1)
公式(1)中,T表示根据电子设备的设备位姿(R,t)确定的刚体变换矩阵,
Figure PCTCN2021109248-appb-000001
其中,R表示电子设备的设备位姿中的旋转矩阵,t表示电子设备的设备位姿中的平移 矩阵。
In formula (1), T represents the rigid body transformation matrix determined according to the device pose (R, t) of the electronic device,
Figure PCTCN2021109248-appb-000001
Wherein, R represents the rotation matrix in the device pose of the electronic device, and t represents the translation matrix in the device pose of the electronic device.
然后可根据第三人脸位姿及地面信息,确定目标对象的第一身高。在一种可能的实现方式中,可根据第三人脸位姿,确定目标对象的头顶位置,然后根据目标对象的头顶位置及地面信息,确定目标对象的第一身高。例如,假设根据第三人脸位姿确定的目标对象的头顶位置为(x 1,y 1,z 1),地面信息通过空间中的平面方程F=f(x,y,z)表示,可在Y轴方向上计算(x 1,y 1,z 1)到地面F=f(x,y,z)的第一距离L 1,并将该第一距离L 1确定为目标对象的第一身高L,即L=L 1Then, the first height of the target object can be determined according to the third face pose and ground information. In a possible implementation manner, the position of the top of the target object may be determined according to the pose of the third face, and then the first height of the target object may be determined according to the position of the top of the target object and ground information. For example, assuming that the head position of the target object determined according to the pose of the third face is (x 1 , y 1 , z 1 ), the ground information is represented by the plane equation F=f(x, y, z) in space, which can be Calculate the first distance L 1 from (x 1 , y 1 , z 1 ) to the ground F=f(x, y, z) in the direction of the Y axis, and determine the first distance L 1 as the first distance of the target object Height L, ie L=L 1 .
通过确定目标对象的头顶位置,并根据头顶位置及地面信息确定目标对象的第一身高,简单快速且能够提高身高检测的准确性。By determining the position of the top of the target object, and determining the first height of the target object according to the position of the top of the head and ground information, it is simple and fast and can improve the accuracy of height detection.
在一种可能的实现方式中,可根据第三人脸位姿,确定目标对象的鼻尖位置,然后根据目标对象的鼻尖位置及预设的鼻尖到下巴与鼻尖到头顶之间的比例关系,确定目标对象的头顶位置,并根据目标对象的头顶位置及地面信息,确定目标对象的第一身高。In a possible implementation, the position of the tip of the target's nose can be determined according to the pose of the third face, and then according to the position of the tip of the target's nose and the preset ratio between the tip of the nose to the chin and the tip of the nose to the top of the head, determine The head position of the target object, and the first height of the target object is determined according to the head position of the target object and ground information.
例如,假设根据第三人脸位姿确定的目标对象的鼻尖位置为(x 2,y 2,z 2),地面信息通过空间中的平面方程F=f(x,y,z)表示,可根据目标对象的鼻尖位置(x 2,y 2,z 2)及预设的鼻尖到下巴与鼻尖到头顶之间的比例关系,确定目标对象的头顶位置(x 3,y 3,z 3),然后在Y轴方向上计算(x 3,y 3,z 3)到地面F=f(x,y,z)的第二距离L 2,并将该第二距离L 2确定为目标对象的第一身高L,即L=L 2For example, assuming that the nose tip position of the target object determined according to the third face pose is (x 2 , y 2 , z 2 ), the ground information is represented by the plane equation F=f(x, y, z) in space, which can be According to the nose tip position (x 2 , y 2 , z 2 ) of the target object and the preset proportional relationship between the nose tip to the chin and the nose tip to the top of the head, determine the head position (x 3 , y 3 , z 3 ) of the target object, Then calculate the second distance L 2 from (x 3 , y 3 , z 3 ) to the ground F=f(x, y, z) in the direction of the Y axis, and determine the second distance L 2 as the first distance of the target object A person is L in height, that is, L=L 2 .
通过目标对象的鼻尖位置及鼻尖到下巴与鼻尖到头顶之间的比例关系,确定目标对象的头顶位置,并根据目标对象的头顶位置及地面信息,确定目标对象的第一身高,从而能够提高身高检测的准确性。Determine the target's head position through the target's nose tip position and the proportional relationship between the nose tip to the chin and the nose tip to the top of the head, and determine the target's first height based on the target's head position and ground information, so as to increase the height detection accuracy.
在一种可能的实现方式中,在与目标对象对应的人脸区域为多个的情况下,针对任一人脸区域,可通过与上述类似的方式,根据多个视频帧中的地面信息、目标对象的第一人脸位姿及电子设备拍摄该人脸区域所在的视频帧时的设备位姿,确定目标对象的第二身高;然后对多个第二身高进行卡尔曼滤波、取平均值等后处理,得到目标对象的第一身高。通过这种方式,能够提高身高检测的准确性。In a possible implementation, when there are multiple human face regions corresponding to the target object, for any human face region, in a manner similar to the above, according to the ground information in multiple video frames, the target The first face pose of the object and the device pose when the electronic device shoots the video frame where the face area is located, determine the second height of the target object; then perform Kalman filtering and averaging on multiple second heights, etc. After processing, the first height of the target object is obtained. In this way, the accuracy of height detection can be improved.
在一种可能的实现方式中,确定出目标对象的第一身高后,还可在电子设备的显示界面中显示目标对象的第一身高。例如,确定出目标对象的第一身高后,可在电子设备的显示界面中通过动画、文本、增强现实(augmented reality,AR)等方式,显示目标对象的第一身高。在身高检测通过身高APP实现时,可在身高APP的显示界面中显示目标对象的第一身高。身高APP的显示界面可包括电子设备的图像采集部件采集视频帧的实时图像界面。In a possible implementation manner, after the first height of the target object is determined, the first height of the target object may also be displayed on a display interface of the electronic device. For example, after the first height of the target object is determined, the first height of the target object may be displayed on the display interface of the electronic device through animation, text, augmented reality (augmented reality, AR) and other means. When the height detection is implemented through the height APP, the first height of the target object can be displayed on the display interface of the height APP. The display interface of the height APP may include a real-time image interface of video frames collected by the image acquisition part of the electronic device.
图6示出根据本申请一实施例的目标对象的身高显示的示意图。如图6所示,用户通过电子设备600上的身高APP对目标对象630进行身高检测,身高APP的显示界面610显示电子设备600的图像采集部件(图中未示出)实时采集的视频帧,在身高APP检测出目标对象630的身高时,可在身高APP的显示界面610中,通过增强现实图标620的方式,在目标对象630的头顶上方的预设位置显示其身高,显示信息可以为“身高:175CM”。Fig. 6 shows a schematic diagram of displaying the height of a target object according to an embodiment of the present application. As shown in Figure 6, the user detects the height of the target object 630 through the height APP on the electronic device 600, and the display interface 610 of the height APP displays the video frames collected in real time by the image acquisition component (not shown) of the electronic device 600, When the height APP detects the height of the target object 630, its height can be displayed at the preset position above the head of the target object 630 in the display interface 610 of the height APP through the augmented reality icon 620, and the display information can be " Height: 175CM".
上述图6仅以一个目标对象作为示例对身高的显示方式进行了示例性地说明。需 要说明的是,也可以通过上述方式对多个目标对象的身高进行显示。本领域技术人员还可根据实际情况对目标对象的身高的显示方式、显示位置等进行设置,本申请对此不作限制。The above-mentioned FIG. 6 only uses one target object as an example to illustrate the manner of displaying the height. It should be noted that the heights of multiple target objects may also be displayed in the above manner. Those skilled in the art can also set the display manner and display position of the height of the target object according to the actual situation, which is not limited in the present application.
根据本申请实施例的身高检测方法,能够对电子设备的图像采集部件采集的多个视频帧进行语义平面检测,确定多个视频帧中的地面信息;同时对多个视频帧进行人脸检测,确定人脸区域,并根据人脸区域及预设的人脸三维模型,确定多个视频帧中目标对象的第一人脸位姿;然后根据地面信息、第一人脸位姿及电子设备的设备位姿,确定目标对象的第一身高,从而使得身高检测不仅不依赖于专业设备(例如双目相机、深度相机等),而且能够通过人脸识别及人脸三维技术确定目标对象的人脸位姿,进而通过人脸位姿、设备位姿及地面信息确定目标对象的身高,无需手动对目标对象进行定位,也无需拍摄目标对象完整的人体图像,操作方便且准确性高。According to the height detection method of the embodiment of the present application, semantic plane detection can be performed on multiple video frames collected by the image acquisition part of the electronic device, and ground information in multiple video frames can be determined; face detection can be performed on multiple video frames at the same time, Determine the face area, and determine the first face pose of the target object in multiple video frames according to the face area and the preset three-dimensional model of the face; then, according to the ground information, the first face pose and the electronic device The device pose determines the first height of the target object, so that height detection not only does not depend on professional equipment (such as binocular cameras, depth cameras, etc.), but also can determine the face of the target object through face recognition and face 3D technology pose, and then determine the height of the target object through face pose, equipment pose and ground information, without manually positioning the target object, and without taking a complete human body image of the target object, which is easy to operate and highly accurate.
图7示出根据本申请一实施例的身高检测的处理过程的示意图。如图7所示,假设用户通过运行在电子设备上的身高APP对目标对象进行身高检测,在用户打开身高APP的情况下,执行步骤S701,身高APP通过电子设备的图像采集部件采集多个视频帧(即视频流)。可选的,在身高检测过程中,图像采集部件可以持续采集视频流。Fig. 7 shows a schematic diagram of a processing procedure of height detection according to an embodiment of the present application. As shown in Figure 7, assume that the user detects the height of the target object through the height APP running on the electronic device. When the user opens the height APP, step S701 is executed, and the height APP collects multiple videos through the image acquisition part of the electronic device. frame (i.e. video stream). Optionally, during the height detection process, the image acquisition component may continuously acquire video streams.
在语义平面检测通过SLAM技术实现的情况下,可在步骤S702中,判断SLAM是否初始化成功,若SLAM未初始化成功,则提示用户移动电子设备,同时重新执行步骤S701;若SLAM初始化成功,则执行步骤S703,对多个视频帧进行语义平面检测,并在步骤S704中,判断在预设时段内是否检测到地面信息。In the case that the semantic plane detection is realized by SLAM technology, in step S702, it can be judged whether the SLAM initialization is successful, if the SLAM is not initialized successfully, the user is prompted to move the electronic device, and step S701 is re-executed at the same time; if the SLAM initialization is successful, then execute Step S703, perform semantic plane detection on a plurality of video frames, and in step S704, judge whether ground information is detected within a preset time period.
若在预设时段内未检测到地面信息,则提示用户拍摄地面,同时继续执行步骤S701;若在预设时段内检测到地面信息,执行步骤S705,对多个视频帧进行人脸检测,确定人脸区域,并在步骤S706中判断人脸区域是否满足第三预设条件,第三预设条件为人脸区域位于其所在的视频帧的预设区域内。If the ground information is not detected within the preset time period, the user is prompted to take pictures of the ground, and at the same time continue to execute step S701; The human face area, and in step S706, it is judged whether the human face area satisfies the third preset condition, the third preset condition is that the human face area is located in the preset area of the video frame where it is located.
若人脸区域不满足第三预设条件,则提示用户调整电子设备的设备位姿,同时重新执行步骤S701;若人脸区域满足第三预设条件,执行步骤S707,根据人脸区域及预设的人脸三维模型,确定多个视频帧中目标对象的第一人脸位姿,并在步骤S708中,判断第一人脸位姿是否满足第二预设条件,第二预设条件为第一人脸位姿中的俯仰角位于预设的第二角度区间内、第一人脸位姿中的滚转角位于预设的第三角度区间内及第一人脸位姿中的偏航角位于预设的第四角度区间内。If the face area does not meet the third preset condition, prompt the user to adjust the device pose of the electronic device, and re-execute step S701; if the face area meets the third preset condition, execute step S707, The three-dimensional model of the human face is determined to determine the first human face pose of the target object in a plurality of video frames, and in step S708, it is judged whether the first human face pose satisfies the second preset condition, and the second preset condition is The pitch angle in the first face pose is within the preset second angle interval, the roll angle in the first face pose is within the preset third angle interval, and the yaw in the first face pose The angle is within the preset fourth angle range.
若第一人脸位姿不满足第二预设条件,提示用户调整电子设备的设备位姿和/或改变目标对象的人脸位姿,同时重新执行步骤S701;在第一人脸位姿满足第二预设条件的情况下,执行步骤S709,判断电子设备的设备位姿中指示的电子设备的俯仰角是否满足第一预设条件,第一预设条件为电子设备的设备位姿中指示的电子设备的俯仰角位于预设的第一角度区间内。If the first face pose does not meet the second preset condition, prompt the user to adjust the device pose of the electronic device and/or change the face pose of the target object, and re-execute step S701; In the case of the second preset condition, step S709 is executed to determine whether the pitch angle of the electronic device indicated in the device pose of the electronic device satisfies the first preset condition. The first preset condition is that the pitch angle indicated in the device pose of the electronic device The pitch angle of the electronic device is within the preset first angle range.
若电子设备的设备位姿中指示的电子设备的俯仰角不满足第一预设条件,提示用户调整电子设备的设备位姿,同时重新执行步骤S701;若电子设备的设备位姿中指示的电子设备的俯仰角满足第一预设条件,执行步骤S710,根据地面信息、第一人脸位姿及电子设备的设备位姿,确定目标对象的第一身高;然后执行步骤S711,通过增强现实AR等方式,在身高APP的显示界面显示目标对象的第一身高。If the pitch angle of the electronic device indicated in the device pose of the electronic device does not meet the first preset condition, prompt the user to adjust the device pose of the electronic device, and re-execute step S701; The pitch angle of the device meets the first preset condition, execute step S710, determine the first height of the target object according to the ground information, the first face pose and the device pose of the electronic device; then execute step S711, through the augmented reality AR In other ways, the first height of the target object is displayed on the display interface of the height APP.
本申请实施例的身高检测方法,通过SLAM技术及语义分割,能够自动识别地面信息并自动对目标对象进行身高检测,无须手动操作(例如手动点选操作或对目标对象进行标记),而且能够同时对多个目标对象的身高进行检测,从而简化身高检测流程,提高身高检测效率。此外,本申请的实施例通过SLAM技术获取三维信息,还能够避免电子设备与目标对象的人体接触,安全可靠。The height detection method of the embodiment of the present application can automatically identify the ground information and automatically detect the height of the target object through SLAM technology and semantic segmentation, without manual operation (such as manual clicking operation or marking the target object), and can simultaneously The height of multiple target objects is detected, thereby simplifying the height detection process and improving the height detection efficiency. In addition, the embodiment of the present application acquires three-dimensional information through SLAM technology, which can also avoid contact between the electronic device and the human body of the target object, which is safe and reliable.
本申请实施例的身高检测方法,基于普通摄像头(例如单目摄像头)采集的多个视频帧进行身高检测,而无需借助深度相机等专业设备,减少了设备依赖,在一些实施例中,用户通过手持设备(例如手机、智能手表等)即可实现身高检测。同时,本申请的实施例通过人脸识别及人脸三维重建技术获取目标对象的人脸位姿,快速准确,不仅能够提高身高检测的准确性,而且也适用于目标对象移动、拍摄角度变化等场景。The height detection method of the embodiment of the present application performs height detection based on multiple video frames collected by a common camera (such as a monocular camera), without the need for professional equipment such as a depth camera, which reduces equipment dependence. In some embodiments, the user passes Handheld devices (such as mobile phones, smart watches, etc.) can realize height detection. At the same time, the embodiment of the present application obtains the face pose of the target object through face recognition and face three-dimensional reconstruction technology, which is fast and accurate, not only can improve the accuracy of height detection, but also is suitable for target object movement, shooting angle changes, etc. Scenes.
图8示出根据本申请一实施例的身高检测装置的框图。如图8所示,所述身高检测装置应用于电子设备,所述身高检测装置包括:图像采集部件810,用于采集多个视频帧;处理部件820,被配置为:对所述多个视频帧进行语义平面检测,确定所述多个视频帧中的地面信息;对所述多个视频帧进行人脸检测,确定人脸区域;根据所述人脸区域的人脸图像及预设的人脸三维模型,确定所述多个视频帧中目标对象的第一人脸位姿;根据所述地面信息、所述第一人脸位姿及所述电子设备的设备位姿,确定所述目标对象的第一身高。Fig. 8 shows a block diagram of a height detection device according to an embodiment of the present application. As shown in Figure 8, the height detection device is applied to electronic equipment, and the height detection device includes: an image acquisition component 810, configured to capture multiple video frames; a processing component 820, configured to: Carry out semantic plane detection on the frame to determine the ground information in the plurality of video frames; perform face detection to the plurality of video frames to determine the face area; according to the face image of the face area and the preset person A three-dimensional face model, determining the first face pose of the target object in the plurality of video frames; determining the target according to the ground information, the first face pose, and the device pose of the electronic device The first height of the object.
在一种可能的实现方式中,所述处理部件还被配置为如下至少一项:在预设时段内未检测到所述地面信息的情况下,提示用户拍摄地面;在所述设备位姿指示的所述电子设备的俯仰角不满足第一预设条件的情况下,提示用户调整所述设备位姿;在所述第一人脸位姿不满足第二预设条件的情况下,提示用户调整所述设备位姿和/或改变所述目标对象的人脸位姿;或在所述人脸区域不满足第三预设条件的情况下,提示用户调整所述设备位姿。In a possible implementation manner, the processing component is further configured to at least one of the following: if the ground information is not detected within a preset period of time, prompt the user to take pictures of the ground; When the pitch angle of the electronic device does not meet the first preset condition, the user is prompted to adjust the pose of the device; when the first human face pose does not meet the second preset condition, the user is prompted Adjusting the pose of the device and/or changing the face pose of the target object; or prompting the user to adjust the pose of the device when the face area does not meet a third preset condition.
在一种可能的实现方式中,所述根据所述地面信息、所述第一人脸位姿及所述电子设备的设备位姿,确定所述目标对象的第一身高,包括:根据所述地面信息、所述第一人脸位姿及所述设备位姿,确定所述目标对象的第二身高;对所述第二身高进行后处理,得到所述第一身高,所述后处理包括卡尔曼滤波。In a possible implementation manner, the determining the first height of the target object according to the ground information, the first face pose, and the device pose of the electronic device includes: according to the The ground information, the first human face pose and the device pose determine the second height of the target object; perform post-processing on the second height to obtain the first height, and the post-processing includes Kalman filter.
在一种可能的实现方式中,所述处理部件还被配置为:在所述电子设备的显示界面显示中所述第一身高。In a possible implementation manner, the processing component is further configured to: display the first height on a display interface of the electronic device.
本申请的实施例提供了一种身高检测装置,包括:图像采集部件,用于采集多个视频帧;处理器以及用于存储处理器可执行指令的存储器;其中,所述处理器被配置为执行所述指令时实现上述方法。An embodiment of the present application provides a height detection device, including: an image acquisition component for acquiring multiple video frames; a processor and a memory for storing processor-executable instructions; wherein the processor is configured to The above method is implemented when the instructions are executed.
本申请的实施例提供了一种非易失性计算机可读存储介质,其上存储有计算机程序指令,所述计算机程序指令被处理器执行时实现上述方法。An embodiment of the present application provides a non-volatile computer-readable storage medium, on which computer program instructions are stored, and when the computer program instructions are executed by a processor, the foregoing method is realized.
本申请的实施例提供了一种计算机程序产品,包括计算机可读代码,或者承载有计算机可读代码的非易失性计算机可读存储介质,当所述计算机可读代码在电子设备的处理器中运行时,所述电子设备中的处理器执行上述方法。An embodiment of the present application provides a computer program product, including computer-readable codes, or a non-volatile computer-readable storage medium bearing computer-readable codes, when the computer-readable codes are stored in a processor of an electronic device When running in the electronic device, the processor in the electronic device executes the above method.
计算机可读存储介质可以是可以保持和存储由指令执行设备使用的指令的有形设备。计算机可读存储介质例如可以是――但不限于――电存储设备、磁存储设备、光 存储设备、电磁存储设备、半导体存储设备或者上述的任意合适的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括:便携式计算机盘、硬盘、随机存取存储器(Random Access Memory,RAM)、只读存储器(Read Only Memory,ROM)、可擦式可编程只读存储器(Electrically Programmable Read-Only-Memory,EPROM或闪存)、静态随机存取存储器(Static Random-Access Memory,SRAM)、便携式压缩盘只读存储器(Compact Disc Read-Only Memory,CD-ROM)、数字多功能盘(Digital Video Disc,DVD)、记忆棒、软盘、机械编码设备、例如其上存储有指令的打孔卡或凹槽内凸起结构、以及上述的任意合适的组合。A computer readable storage medium may be a tangible device that can retain and store instructions for use by an instruction execution device. A computer readable storage medium may be, for example, but is not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (non-exhaustive list) of computer-readable storage media include: portable computer disk, hard disk, random access memory (Random Access Memory, RAM), read only memory (Read Only Memory, ROM), erasable Electrically Programmable Read-Only-Memory (EPROM or flash memory), Static Random-Access Memory (Static Random-Access Memory, SRAM), Portable Compression Disk Read-Only Memory (Compact Disc Read-Only Memory, CD -ROM), Digital Video Disc (DVD), memory sticks, floppy disks, mechanically encoded devices such as punched cards or raised structures in grooves with instructions stored thereon, and any suitable combination of the foregoing .
这里所描述的计算机可读程序指令或代码可以从计算机可读存储介质下载到各个计算/处理设备,或者通过网络、例如因特网、局域网、广域网和/或无线网下载到外部计算机或外部存储设备。网络可以包括铜传输电缆、光纤传输、无线传输、路由器、防火墙、交换机、网关计算机和/或边缘服务器。每个计算/处理设备中的网络适配卡或者网络接口从网络接收计算机可读程序指令,并转发该计算机可读程序指令,以供存储在各个计算/处理设备中的计算机可读存储介质中。Computer readable program instructions or codes described herein may be downloaded from a computer readable storage medium to a respective computing/processing device, or downloaded to an external computer or external storage device over a network, such as the Internet, local area network, wide area network, and/or wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers. A network adapter card or a network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in each computing/processing device .
用于执行本申请操作的计算机程序指令可以是汇编指令、指令集架构(Instruction Set Architecture,ISA)指令、机器指令、机器相关指令、微代码、固件指令、状态设置数据、或者以一种或多种编程语言的任意组合编写的源代码或目标代码,所述编程语言包括面向对象的编程语言—诸如Smalltalk、C++等,以及常规的过程式编程语言—诸如“C”语言或类似的编程语言。计算机可读程序指令可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络—包括局域网(Local Area Network,LAN)或广域网(Wide Area Network,WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。在一些实施例中,通过利用计算机可读程序指令的状态信息来个性化定制电子电路,例如可编程逻辑电路、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或可编程逻辑阵列(Programmable Logic Array,PLA),该电子电路可以执行计算机可读程序指令,从而实现本申请的各个方面。Computer program instructions for performing the operations of the present application may be assembly instructions, instruction set architecture (Instruction Set Architecture, ISA) instructions, machine instructions, machine-related instructions, microcode, firmware instructions, state setting data, or in one or more source or object code written in any combination of programming languages, including object-oriented programming languages—such as Smalltalk, C++, etc., and conventional procedural programming languages—such as the “C” language or similar programming languages. Computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server implement. In cases involving a remote computer, the remote computer can be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or it can be connected to an external computer such as use an Internet service provider to connect via the Internet). In some embodiments, electronic circuits, such as programmable logic circuits, field-programmable gate arrays (Field-Programmable Gate Array, FPGA) or programmable logic arrays (Programmable Logic Array, PLA), the electronic circuit can execute computer-readable program instructions, thereby realizing various aspects of the present application.
这里参照根据本申请实施例的方法、装置(系统)和计算机程序产品的流程图和/或框图描述了本申请的各个方面。应当理解,流程图和/或框图的每个方框以及流程图和/或框图中各方框的组合,都可以由计算机可读程序指令实现。Aspects of the present application are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It should be understood that each block of the flowcharts and/or block diagrams, and combinations of blocks in the flowcharts and/or block diagrams, can be implemented by computer-readable program instructions.
这些计算机可读程序指令可以提供给通用计算机、专用计算机或其它可编程数据处理装置的处理器,从而生产出一种机器,使得这些指令在通过计算机或其它可编程数据处理装置的处理器执行时,产生了实现流程图和/或框图中的一个或多个方框中规定的功能/动作的装置。也可以把这些计算机可读程序指令存储在计算机可读存储介质中,这些指令使得计算机、可编程数据处理装置和/或其他设备以特定方式工作,从而,存储有指令的计算机可读介质则包括一个制造品,其包括实现流程图和/或框图中的一个或多个方框中规定的功能/动作的各个方面的指令。These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine such that when executed by the processor of the computer or other programmable data processing apparatus , producing an apparatus for realizing the functions/actions specified in one or more blocks in the flowchart and/or block diagram. These computer-readable program instructions can also be stored in a computer-readable storage medium, and these instructions cause computers, programmable data processing devices and/or other devices to work in a specific way, so that the computer-readable medium storing instructions includes An article of manufacture comprising instructions for implementing various aspects of the functions/acts specified in one or more blocks in flowcharts and/or block diagrams.
也可以把计算机可读程序指令加载到计算机、其它可编程数据处理装置、或其它 设备上,使得在计算机、其它可编程数据处理装置或其它设备上执行一系列操作步骤,以产生计算机实现的过程,从而使得在计算机、其它可编程数据处理装置、或其它设备上执行的指令实现流程图和/或框图中的一个或多个方框中规定的功能/动作。It is also possible to load computer-readable program instructions into a computer, other programmable data processing device, or other equipment, so that a series of operational steps are performed on the computer, other programmable data processing device, or other equipment to produce a computer-implemented process , so that instructions executed on computers, other programmable data processing devices, or other devices implement the functions/actions specified in one or more blocks in the flowcharts and/or block diagrams.
附图中的流程图和框图显示了根据本申请的多个实施例的装置、系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段或指令的一部分,所述模块、程序段或指令的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个连续的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。The flowchart and block diagrams in the figures show the architecture, functions and operations of possible implementations of apparatuses, systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in a flowchart or block diagram may represent a module, a portion of a program segment, or an instruction that includes one or more Executable instructions. In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved.
也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行相应的功能或动作的硬件(例如电路或ASIC(Application Specific Integrated Circuit,专用集成电路))来实现,或者可以用硬件和软件的组合,如固件等来实现。It should also be noted that each block in the block diagrams and/or flowcharts, and combinations of blocks in the block diagrams and/or flowcharts, can be implemented with hardware (such as circuits or ASIC (Application Specific Integrated Circuit, application-specific integrated circuit)), or can be implemented with a combination of hardware and software, such as firmware.
尽管在此结合各实施例对本发明进行了描述,然而,在实施所要求保护的本发明过程中,本领域技术人员通过查看所述附图、公开内容、以及所附权利要求书,可理解并实现所述公开实施例的其它变化。在权利要求中,“包括”(comprising)一词不排除其他组成部分或步骤,“一”或“一个”不排除多个的情况。单个处理器或其它单元可以实现权利要求中列举的若干项功能。相互不同的从属权利要求中记载了某些措施,但这并不表示这些措施不能组合起来产生良好的效果。Although the present invention has been described in conjunction with various embodiments herein, in the process of implementing the claimed invention, those skilled in the art can understand and Other variations of the disclosed embodiments are implemented. In the claims, the word "comprising" does not exclude other components or steps, and "a" or "an" does not exclude a plurality. A single processor or other unit may fulfill the functions of several items recited in the claims. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that these measures cannot be combined to advantage.
以上已经描述了本申请的各实施例,上述说明是示例性的,并非穷尽性的,并且也不限于所披露的各实施例。在不偏离所说明的各实施例的范围的情况下,对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。本文中所用术语的选择,旨在最好地解释各实施例的原理、实际应用或对市场中的技术的改进,或者使本技术领域的其它普通技术人员能理解本文披露的各实施例。Having described various embodiments of the present application above, the foregoing description is exemplary, not exhaustive, and is not limited to the disclosed embodiments. Many modifications and alterations will be apparent to those of ordinary skill in the art without departing from the scope of the described embodiments. The terminology used herein is chosen to best explain the principle of each embodiment, practical application or improvement of technology in the market, or to enable other ordinary skilled in the art to understand each embodiment disclosed herein.

Claims (11)

  1. 一种身高检测方法,其特征在于,包括:A height detection method, characterized in that, comprising:
    对电子设备的图像采集部件采集的多个视频帧进行语义平面检测,确定所述多个视频帧中的地面信息;Carrying out semantic plane detection on a plurality of video frames collected by the image acquisition part of the electronic device, and determining the ground information in the plurality of video frames;
    对所述多个视频帧进行人脸检测,确定人脸区域;Carry out face detection to described multiple video frames, determine face region;
    根据所述人脸区域及预设的人脸三维模型,确定所述多个视频帧中目标对象的第一人脸位姿;According to the face area and the preset three-dimensional model of the face, determine the first face pose of the target object in the plurality of video frames;
    根据所述地面信息、所述第一人脸位姿及所述电子设备的设备位姿,确定所述目标对象的第一身高。A first height of the target object is determined according to the ground information, the first face pose, and the device pose of the electronic device.
  2. 根据权利要求1所述的方法,其特征在于,所述方法还包括如下至少一项:The method according to claim 1, wherein the method further comprises at least one of the following:
    在预设时段内未检测到所述地面信息的情况下,提示用户拍摄地面;If the ground information is not detected within a preset period of time, prompting the user to take pictures of the ground;
    在所述设备位姿指示的所述电子设备的俯仰角不满足第一预设条件的情况下,提示用户调整所述设备位姿;Prompting the user to adjust the device pose when the pitch angle of the electronic device indicated by the device pose does not meet a first preset condition;
    在所述第一人脸位姿不满足第二预设条件的情况下,提示用户调整所述设备位姿和/或改变所述目标对象的人脸位姿;或If the first face pose does not meet the second preset condition, prompting the user to adjust the device pose and/or change the face pose of the target object; or
    在所述人脸区域不满足第三预设条件的情况下,提示用户调整所述设备位姿。If the face area does not meet the third preset condition, the user is prompted to adjust the pose of the device.
  3. 根据权利要求1或2所述的方法,其特征在于,所述根据所述地面信息、所述第一人脸位姿及所述电子设备的设备位姿,确定所述目标对象的第一身高,包括:The method according to claim 1 or 2, wherein the first height of the target object is determined according to the ground information, the first face pose and the device pose of the electronic device ,include:
    根据所述地面信息、所述第一人脸位姿及所述设备位姿,确定所述目标对象的第二身高;determining a second height of the target object according to the ground information, the first face pose, and the device pose;
    对所述第二身高进行后处理,得到所述第一身高,所述后处理包括卡尔曼滤波。performing post-processing on the second height to obtain the first height, and the post-processing includes Kalman filtering.
  4. 根据权利要求1-3中任意一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1-3, further comprising:
    在所述电子设备的显示界面中显示所述第一身高。Displaying the first height on a display interface of the electronic device.
  5. 一种身高检测装置,其特征在于,应用于电子设备,包括:A height detection device, characterized in that it is applied to electronic equipment, including:
    图像采集部件,用于采集多个视频帧;An image acquisition component, configured to acquire a plurality of video frames;
    处理部件,被配置为:processing component, configured to:
    对所述多个视频帧进行语义平面检测,确定所述多个视频帧中的地面信息;Carry out semantic plane detection to the plurality of video frames, and determine ground information in the plurality of video frames;
    对所述多个视频帧进行人脸检测,确定人脸区域;Carry out face detection to described multiple video frames, determine face region;
    根据所述人脸区域的人脸图像及预设的人脸三维模型,确定所述多个视频帧中目标对象的第一人脸位姿;According to the face image of the face area and the preset three-dimensional model of the face, determine the first face pose of the target object in the plurality of video frames;
    根据所述地面信息、所述第一人脸位姿及所述电子设备的设备位姿,确定所述目标对象的第一身高。A first height of the target object is determined according to the ground information, the first face pose, and the device pose of the electronic device.
  6. 根据权利要求5所述的装置,其特征在于,所述处理部件还被配置为如下至少一项:The device according to claim 5, wherein the processing component is further configured as at least one of the following:
    在预设时段内未检测到所述地面信息的情况下,提示用户拍摄地面;If the ground information is not detected within a preset period of time, prompting the user to take pictures of the ground;
    在所述设备位姿指示的所述电子设备的俯仰角不满足第一预设条件的情况下,提示用户调整所述设备位姿;Prompting the user to adjust the device pose when the pitch angle of the electronic device indicated by the device pose does not meet a first preset condition;
    在所述第一人脸位姿不满足第二预设条件的情况下,提示用户调整所述设备位姿和/或改变所述目标对象的人脸位姿;或If the first face pose does not meet the second preset condition, prompting the user to adjust the device pose and/or change the face pose of the target object; or
    在所述人脸区域不满足第三预设条件的情况下,提示用户调整所述设备位姿。If the face area does not meet the third preset condition, the user is prompted to adjust the pose of the device.
  7. 根据权利要求5或6所述的装置,其特征在于,所述根据所述地面信息、所述第一人脸位姿及所述电子设备的设备位姿,确定所述目标对象的第一身高,包括:The device according to claim 5 or 6, wherein the first height of the target object is determined according to the ground information, the first human face pose and the device pose of the electronic device ,include:
    根据所述地面信息、所述第一人脸位姿及所述设备位姿,确定所述目标对象的第二身高;determining a second height of the target object according to the ground information, the first face pose, and the device pose;
    对所述第二身高进行后处理,得到所述第一身高,所述后处理包括卡尔曼滤波。performing post-processing on the second height to obtain the first height, and the post-processing includes Kalman filtering.
  8. 根据权利要求5-7中任意一项所述的装置,其特征在于,所述处理部件还被配置为:The device according to any one of claims 5-7, wherein the processing component is further configured to:
    在所述电子设备的显示界面显示中所述第一身高。The first height is displayed on the display interface of the electronic device.
  9. 一种身高检测装置,其特征在于,包括:A height detection device, characterized in that it comprises:
    图像采集部件,用于采集多个视频帧;An image acquisition component, configured to acquire a plurality of video frames;
    处理器;processor;
    用于存储处理器可执行指令的存储器;memory for storing processor-executable instructions;
    其中,所述处理器被配置为执行所述指令时实现权利要求1-4任意一项所述的方 法。Wherein, the processor is configured to implement the method according to any one of claims 1-4 when executing the instructions.
  10. 一种非易失性计算机可读存储介质,其上存储有计算机程序指令,其特征在于,所述计算机程序指令被处理器执行时实现权利要求1-4中任意一项所述的方法。A non-volatile computer-readable storage medium, on which computer program instructions are stored, wherein, when the computer program instructions are executed by a processor, the method according to any one of claims 1-4 is implemented.
  11. 一种计算机程序产品,包括计算机可读代码,或者承载有计算机可读代码的非易失性计算机可读存储介质,当所述计算机可读代码在电子设备中运行时,所述电子设备中的处理器执行权利要求1-4中任意一项所述的方法。A computer program product, comprising computer-readable codes, or a non-volatile computer-readable storage medium carrying computer-readable codes, when the computer-readable codes run in an electronic device, the The processor executes the method of any one of claims 1-4.
PCT/CN2021/109248 2021-07-29 2021-07-29 Height measurement method and apparatus, and storage medium WO2023004682A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202180006425.1A CN115885316A (en) 2021-07-29 2021-07-29 Height detection method, device and storage medium
PCT/CN2021/109248 WO2023004682A1 (en) 2021-07-29 2021-07-29 Height measurement method and apparatus, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/109248 WO2023004682A1 (en) 2021-07-29 2021-07-29 Height measurement method and apparatus, and storage medium

Publications (1)

Publication Number Publication Date
WO2023004682A1 true WO2023004682A1 (en) 2023-02-02

Family

ID=85086031

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/109248 WO2023004682A1 (en) 2021-07-29 2021-07-29 Height measurement method and apparatus, and storage medium

Country Status (2)

Country Link
CN (1) CN115885316A (en)
WO (1) WO2023004682A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104236462A (en) * 2013-06-14 2014-12-24 北京千里时空科技有限公司 Method for extracting height and distance of object in video image
CN105286871A (en) * 2015-11-27 2016-02-03 西安交通大学 Video processing-based body height measurement method
CN106361345A (en) * 2016-11-29 2017-02-01 公安部第三研究所 System and method for measuring height of human body in video image based on camera calibration
CN110136190A (en) * 2019-03-26 2019-08-16 华为技术有限公司 A kind of distance measuring method and electronic equipment
CN111012353A (en) * 2019-12-06 2020-04-17 西南交通大学 Height detection method based on face key point recognition

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104236462A (en) * 2013-06-14 2014-12-24 北京千里时空科技有限公司 Method for extracting height and distance of object in video image
CN105286871A (en) * 2015-11-27 2016-02-03 西安交通大学 Video processing-based body height measurement method
CN106361345A (en) * 2016-11-29 2017-02-01 公安部第三研究所 System and method for measuring height of human body in video image based on camera calibration
CN110136190A (en) * 2019-03-26 2019-08-16 华为技术有限公司 A kind of distance measuring method and electronic equipment
CN111012353A (en) * 2019-12-06 2020-04-17 西南交通大学 Height detection method based on face key point recognition

Also Published As

Publication number Publication date
CN115885316A (en) 2023-03-31

Similar Documents

Publication Publication Date Title
US20210233310A1 (en) Automated three dimensional model generation
CN113810587B (en) Image processing method and device
JP7058760B2 (en) Image processing methods and their devices, terminals and computer programs
US11615592B2 (en) Side-by-side character animation from realtime 3D body motion capture
CN112334869A (en) Electronic device and control method thereof
US11810316B2 (en) 3D reconstruction using wide-angle imaging devices
US11468673B2 (en) Augmented reality system using structured light
US11688136B2 (en) 3D object model reconstruction from 2D images
US10748000B2 (en) Method, electronic device, and recording medium for notifying of surrounding situation information
US11887322B2 (en) Depth estimation using biometric data
US20230267687A1 (en) 3d object model reconstruction from 2d images
US20230224574A1 (en) Photographing method and apparatus
CN115699096A (en) Tracking augmented reality device
WO2022261856A1 (en) Image processing method and apparatus, and storage medium
WO2023004682A1 (en) Height measurement method and apparatus, and storage medium
WO2021244040A1 (en) Facial expression editing method and electronic device
US20240096031A1 (en) Graphical assistance with tasks using an ar wearable device
US20240073402A1 (en) Multi-perspective augmented reality experience
US11922096B1 (en) Voice controlled UIs for AR wearable devices
CN115880348B (en) Face depth determining method, electronic equipment and storage medium
US20240126502A1 (en) Voice controlled uis for ar wearable devices
KR20240005953A (en) Reduce startup time for augmented reality experiences

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21951290

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE