WO2023004682A1

WO2023004682A1 - Height measurement method and apparatus, and storage medium

Info

Publication number: WO2023004682A1
Application number: PCT/CN2021/109248
Authority: WO
Inventors: 焦磊磊; 马超群; 张旭; 段超
Original assignee: 华为技术有限公司
Priority date: 2021-07-29
Filing date: 2021-07-29
Publication date: 2023-02-02
Also published as: CN115885316A

Abstract

The present application relates to a height measurement method and apparatus, and a storage medium. The method comprises: performing semantic plane detection on a plurality of video frames, which are collected by an image collection component of an electronic device, and determining ground information in the plurality of video frames; performing facial detection on the plurality of video frames, so as to determine a facial area; determining a first facial pose of a target object in the plurality of video frames according to the facial area and a preset facial three-dimensional model; and determining a first height of the target object according to the ground information, the first facial pose and a device pose of the electronic device. The height measurement in the embodiments of the present application does not rely on a professional device, and the height measurement can be automatically performed on a target object without the need of manual positioning. The operation is convenient, and the accuracy is high.

Description

Height detection method, device and storage medium

technical field

The present application relates to the technical field of image processing, in particular to a height detection method, device and storage medium.

Background technique

The traditional height measurement method usually requires the measurer to manually operate a professional instrument, such as a height measuring instrument, which not only has low measurement efficiency, but also is inconvenient to carry and is not suitable for personal use.

With the development of image processing technology, professional equipment such as binocular cameras and depth cameras can be used to capture the depth image of the human target to be measured, and the height of the human target can be measured by processing the depth image. However, this method relies heavily on professional equipment such as binocular cameras and depth cameras, and also needs to capture a complete human body image (ie, a full-body image of a human target), which has certain limitations.

Contents of the invention

In view of this, a height detection method, device and storage medium are proposed.

In the first aspect, the embodiment of the present application provides a height detection method, the method includes: performing semantic plane detection on multiple video frames collected by the image acquisition part of the electronic device, and determining the ground height in the multiple video frames Information; face detection is performed on the multiple video frames, and the face area is determined; according to the face area and the preset three-dimensional model of the face, determine the first face position of the target object in the multiple video frames pose; determine the first height of the target object according to the ground information, the first face pose, and the device pose of the electronic device.

The embodiment of the present application can perform semantic plane detection on multiple video frames collected by the image acquisition component of the electronic device, and determine the ground information in multiple video frames; perform face detection on multiple video frames at the same time, and determine the face area , and according to the face area and the preset three-dimensional model of the face, determine the first face pose of the target object in multiple video frames; then according to the ground information, the first face pose and the device pose of the electronic device, Determine the first height of the target object, so that height detection not only does not depend on professional equipment (such as binocular cameras, depth cameras, etc.), but also can determine the face pose of the target object through face recognition and face 3D technology, and then The height of the target object is determined through the face pose, device pose, and ground information. There is no need to manually locate the target object, and there is no need to take a complete human body image of the target object. The operation is convenient and the accuracy is high.

According to the first aspect, in the first possible implementation of the height detection method, the method further includes at least one of the following: if the ground information is not detected within a preset period of time, prompting the user to take a picture ground; when the pitch angle of the electronic device indicated by the device pose does not meet the first preset condition, prompt the user to adjust the device pose; if the first face pose does not meet the second In the case of preset conditions, prompt the user to adjust the device pose and/or change the face pose of the target object; or in the case of the face area does not meet the third preset condition, prompt the user to adjust The device pose.

In the embodiment of the present application, it is possible to detect that the ground information, the pitch angle of the electronic device indicated by the device pose, does not meet the first preset condition, and the first human face pose within a preset period of time. When at least one of the second preset condition and the third preset condition is not met in the face area, the user is prompted, for example, the user is prompted to take pictures of the ground, adjust the pose of the device, and change the person of the target object. Face pose, etc., so that the user can make corresponding adjustments, thereby improving the accuracy of height detection.

According to the first aspect or the first possible implementation of the first aspect, in the second possible implementation of the height detection method, according to the ground information, the first face pose and the The device pose of the electronic device, and determining the first height of the target object includes: determining the second height of the target object according to the ground information, the first human face pose, and the device pose ; Performing post-processing on the second height to obtain the first height, the post-processing includes Kalman filtering.

In the embodiment of the present application, the second height of the target object can be determined according to the ground information, the first face pose, and the pose of the device, and post-processing such as Kalman filtering can be performed on the second height to obtain the first height of the target object , so as to improve the accuracy of height detection.

According to the first aspect or the first possible implementation of the first aspect or the second possible implementation of the first aspect, in a third possible implementation of the height detection method, the method further includes: Displaying the first height on a display interface of the electronic device.

In the embodiment of the present application, after the first height of the target object is determined, the first height of the target object can be displayed in the display interface of the electronic device through animation, text, augmented reality (augmented reality, AR), etc., thereby improving the height of the target object. user experience.

According to the first aspect, in the fourth possible implementation of the height detection method, the ground information is located in the world coordinate system, the first face pose is located in the camera coordinate system, and the The ground information, the first human face pose and the device pose of the electronic device, and determining the first height of the target object include: adjusting the first human face pose according to a preset interpupillary distance reference value , to obtain the second face pose; according to the device pose, perform coordinate transformation on the second face pose to obtain the third face pose of the target object, the third face pose The pose is located in the world coordinate system; according to the third face pose and the ground information, determine the first height of the target object.

The embodiment of the present application can adjust and transform the first face pose in the camera coordinate system to obtain the third face pose in the world coordinate system, and according to the third face pose and the ground Information to determine the first height of the target object, so that the first height of the target object can be calculated in the world coordinate system, and the accuracy of height detection can be improved.

According to the fourth possible implementation manner of the first aspect, in the fifth possible implementation manner of the height detection method, the first human face pose is adjusted according to a preset interpupillary distance reference value to Obtaining the second face pose includes: determining a face size transformation coefficient according to a preset interpupillary distance reference value and the interpupillary distance value in the first face pose; The first face pose is adjusted to obtain the second face pose of the target object.

In the embodiment of the present application, the face size transformation coefficient is determined through the interpupillary distance reference value and the interpupillary distance value in the first face pose, and the first face pose is adjusted according to the face size transformation coefficient, The second face pose of the target object is obtained, so that the actual size and pose of the face of the target object in the camera coordinate system can be obtained.

According to the fourth possible implementation of the first aspect, in the sixth possible implementation of the height detection method, the target is determined according to the third face pose and the ground information The first height of the object includes: determining the position of the top of the target object according to the third face pose; and determining the first height of the target object according to the position of the top of the head and the ground information.

In the embodiment of the present application, by determining the position of the top of the target object, and determining the first height of the target object according to the position of the top of the head and ground information, it is simple and fast and can improve the accuracy of height detection.

In a second aspect, an embodiment of the present application provides a height detection device, the height detection device is applied to electronic equipment, and includes an image acquisition component for capturing multiple video frames; a processing component configured to: A plurality of video frames is carried out semantic plane detection, and the ground information in the plurality of video frames is determined; Face detection is carried out to the plurality of video frames, and a human face area is determined; The three-dimensional model of the human face is set, and the first human face pose of the target object in the plurality of video frames is determined; according to the ground information, the first human face pose and the device pose of the electronic device, determine The first height of the target object.

According to the second aspect, in the first possible implementation manner of the height detection device, the processing component is further configured to at least one of the following: when the ground information is not detected within a preset period of time, Prompting the user to take pictures of the ground; when the pitch angle of the electronic device indicated by the device pose does not meet the first preset condition, prompting the user to adjust the device pose; When the second preset condition is met, prompt the user to adjust the pose of the device and/or change the face pose of the target object; or if the face area does not meet the third preset condition, The user is prompted to adjust the device pose.

In the embodiment of the present application, it is possible to detect that the ground information, the pitch angle of the electronic device indicated by the device pose, does not meet the first preset condition, and the first human face pose within a preset period of time. When at least one of the second preset condition is not met or the face area does not meet the third preset condition, prompt the user, for example, prompt the user to take pictures of the ground, adjust the pose of the device, change the person of the target object Face pose, etc., so that the user can make corresponding adjustments, thereby improving the accuracy of height detection.

According to the second aspect or the first possible implementation of the second aspect, in the second possible implementation of the height detection device, according to the ground information, the first face pose and The device pose of the electronic device, determining the first height of the target object includes: determining the second height of the target object according to the ground information, the first face pose, and the device pose. height; performing post-processing on the second height to obtain the first height, and the post-processing includes Kalman filtering.

According to the second aspect or the first possible implementation of the second aspect or the second possible implementation of the second aspect, in a third possible implementation of the height detection device, the processing component further It is configured to: display the first height on the display interface of the electronic device.

According to the second aspect, in the fourth possible implementation of the height detection device, the ground information is located in the world coordinate system, the first face pose is located in the camera coordinate system, and the The ground information, the first human face pose and the device pose of the electronic device, and determining the first height of the target object include: adjusting the first human face pose according to a preset interpupillary distance reference value , to obtain the second face pose; according to the device pose, perform coordinate transformation on the second face pose to obtain the third face pose of the target object, the third face pose The pose is located in the world coordinate system; according to the third face pose and the ground information, determine the first height of the target object.

According to the fourth possible implementation manner of the second aspect, in the fifth possible implementation manner of the height detection device, the first human face pose is adjusted according to a preset interpupillary distance reference value to Obtaining the second face pose includes: determining a face size transformation coefficient according to a preset interpupillary distance reference value and the interpupillary distance value in the first face pose; The first face pose is adjusted to obtain the second face pose of the target object.

According to the fourth possible implementation of the second aspect, in the sixth possible implementation of the height detection device, the target is determined according to the third face pose and the ground information The first height of the object includes: determining the position of the top of the target object according to the third face pose; and determining the first height of the target object according to the position of the top of the head and the ground information.

In a third aspect, an embodiment of the present application provides a height measurement device, including: an image acquisition component, configured to acquire a plurality of video frames; a processor; a memory for storing processor-executable instructions; wherein, the processing The device is configured to implement the above-mentioned first aspect or one or more of the height detection methods in multiple possible implementation manners of the first aspect when executing the instructions.

In the fourth aspect, the embodiments of the present application provide a non-volatile computer-readable storage medium, on which computer program instructions are stored, and when the computer program instructions are executed by a processor, the above-mentioned first aspect or the first aspect can be realized One or several of the various possible implementations of the height detection method.

In the fifth aspect, the embodiments of the present application provide a computer program product, including computer readable code, or a non-volatile computer readable storage medium bearing computer readable code, when the computer readable code is stored in an electronic When running in the device, the processor in the electronic device executes the height detection method of the first aspect or one or more of the multiple possible implementations of the first aspect.

These and other aspects of the present application will be made more apparent in the following description of the embodiment(s).

Description of drawings

The accompanying drawings, which are incorporated in and constitute a part of the specification, illustrate exemplary embodiments, features, and aspects of the application and, together with the specification, serve to explain the principles of the application.

Fig. 1 shows a schematic structural diagram of an electronic device according to an embodiment of the present application.

Fig. 2 shows a block diagram of a software structure of an electronic device according to an embodiment of the present application.

Fig. 3 shows a flowchart of a height detection method according to an embodiment of the present application.

Fig. 4 shows a schematic diagram of a detection process of ground information according to an embodiment of the present application.

Fig. 5 shows a schematic diagram of a process of determining a first face pose of a target object according to an embodiment of the present application.

Fig. 6 shows a schematic diagram of displaying the height of a target object according to an embodiment of the present application.

Fig. 7 shows a schematic diagram of a processing procedure of height detection according to an embodiment of the present application.

Fig. 8 shows a block diagram of a height detection device according to an embodiment of the present application.

Detailed ways

Various exemplary embodiments, features, and aspects of the present application will be described in detail below with reference to the accompanying drawings. The same reference numbers in the figures indicate functionally identical or similar elements. While various aspects of the embodiments are shown in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.

The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration." Any embodiment described herein as "exemplary" is not necessarily to be construed as superior or better than other embodiments.

In addition, in order to better illustrate the present application, numerous specific details are given in the following specific implementation manners. It will be understood by those skilled in the art that the present application may be practiced without certain of the specific details. In some instances, methods, means, components and circuits well known to those skilled in the art have not been described in detail in order to highlight the gist of the present application.

In related technologies, when measuring the height of a human body, it is usually necessary to use professional equipment such as binocular cameras and depth cameras, which is highly dependent on the equipment and needs to capture a complete human body image, which has certain limitations. For example, in some technical solutions, use a binocular camera to capture a scene image, obtain the image coordinates of the human head point of the human target in the scene image, and obtain the corresponding depth of the human head point generated by the binocular camera according to the image coordinates of the human head point information, and then use the image coordinates and depth information of the cusp of the human head to calculate the coordinates of the cusp of the human head in the camera coordinate system, and according to the coordinates of the cusp of the human head in the camera coordinate system and the installation height, pitch angle, and tilt of the binocular camera angle, measure the height of the human target.

This technical solution not only needs to be equipped with a binocular camera (that is, it is dependent on the device), but also needs to fix the camera pose and known camera installation height, which has certain restrictions on the use scene. In addition, this technical solution must take pictures of a complete human body to achieve height measurement, which has certain limitations.

For another example, in some technical solutions, a dense semantic map can be generated based on simultaneous localization and mapping (SLAM) technology, and then the plane semantic detection can be realized through the dense semantic map, and the internal relationship between semantics can be automatically Identify the height of the object, and then project based on the ground extraction and segmentation of the focus target, calculate the length and width of the object, and finally obtain the bounding box size (length, width, and height) of the object. Since the human body is one of the generalized objects, the height of the human body can be measured through the technical solution.

However, the generation of dense semantic maps based on semantic SLAM technology needs to rely on depth cameras (that is, there is a dependence on equipment), and this technical solution is better for measuring objects whose surface is parallel to the ground, while the shape of the human body is complex and there is no obvious plane on the top of the head. Not very accurate. In addition, this technical solution needs to capture the whole picture of the target object, including the top of the object, otherwise the complete outline of the object cannot be reconstructed. For the scene of human body height measurement, the photographer needs to shoot the human target from a higher angle, which is inconvenient to operate and has Certain limitations.

With the development of artificial intelligence (AI), some technical solutions use the face height model obtained by machine learning to measure the height of the human body. For example, the face classifier and the face height model can be trained respectively at first, and then the image of the human target to be measured is input into the face classifier for face detection, and the face image of the human target is obtained, and the face image is input The face height model is processed to obtain the height of the human target.

However, the core of this technical solution is the face height model, and the face height model obtained through machine learning not only has poor interpretability, but also relies heavily on training data, and because the relationship between face heights of different races may be different, people The generalization of the face height model is difficult, and the accuracy of the measurement results is not high.

In some other technical solutions, height measurement is performed by manually operating an augmented reality (augmented reality, AR) scale. For example, plane detection and SLAM technology can be used to obtain the spatial equation of the ground, and then the surveyor needs to locate the virtual anchor point at the foot of the human target (that is, the measurement object), and pull the virtual AR ruler from bottom to top to the top of the head Stop, and then use the three-dimensional (3-dimension, 3D) space coordinate system established by SLAM to obtain the length of the AR scale, that is, the height of the human target.

However, this technical solution not only requires manual participation, but the measurement efficiency is low, and the virtual anchor point is clicked on a two-dimensional (2-dimension, 2D) image and projected onto the 3D plane through ray projection. Operation error and other reasons will lead to the phenomenon that the virtual anchor point seems to be positioned at the foot of the human target, but the actual gap is relatively large, that is, the virtual anchor point is not positioned accurately, resulting in inaccurate height measurement results.

In order to solve the above technical problems, the present application provides a height detection method, which can be applied to electronic equipment. The height detection method of the embodiment of the present application can perform semantic plane detection on multiple video frames collected by the image acquisition component of the electronic device, and determine the ground information in multiple video frames; simultaneously perform face detection on multiple video frames to determine Face area; according to the face area and the preset three-dimensional model of the face, determine the first face pose of the target object in multiple video frames, and according to the ground information, the first face pose and the electronic device Device pose, determine the first height of the target object.

In this way, the height of the target object is detected, not only does not rely on professional equipment (such as binocular cameras, depth cameras, etc.), but also can determine the face pose of the target object through face recognition and face three-dimensional technology, and then The height of the target object is determined through the face pose, device pose, and ground information. There is no need to manually locate the target object, and there is no need to take a complete human body image of the target object. The operation is convenient and the accuracy is high.

The electronic devices described in the embodiments of the present application may be touch-screen or non-touch-screen. Touch-screen electronic devices can be controlled by clicking and sliding on the display screen with fingers, stylus, etc., and non-touch-screen Electronic devices can be connected to input devices such as a mouse, a keyboard, and a touch panel, and controlled through the input devices.

Fig. 1 shows a schematic structural diagram of an electronic device 100 according to an embodiment of the present application. The electronic device 100 may include a cell phone, a foldable electronic device, a tablet computer, a desktop computer, a laptop computer, a handheld computer, a notebook computer, an ultra-mobile personal computer (UMPC), a netbook, a cell phone, a personal Digital assistant (personal digital assistant, PDA), augmented reality (augmented reality, AR) equipment, virtual reality (virtual reality, VR) equipment, artificial intelligence (artificial intelligence, AI) equipment, wearable equipment, vehicle equipment, smart home equipment, or at least one of smart city equipment. The embodiment of the present application does not specifically limit the specific type of the electronic device 100 .

The electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (universal serial bus, USB) connector 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, and an antenna 2 , mobile communication module 150, wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, earphone jack 170D, sensor module 180, button 190, motor 191, indicator 192, camera 193, display screen 194, and A subscriber identification module (subscriber identification module, SIM) card interface 195 and the like. The sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, an ambient light sensor 180L, bone conduction sensor 180M, etc.

It can be understood that, the structure illustrated in the embodiment of the present application does not constitute a specific limitation on the electronic device 100 . In other embodiments of the present application, the electronic device 100 may include more or fewer components than shown, or combine some components, or separate some components, or arrange different components. The illustrated components can be realized in hardware, software or a combination of software and hardware.

The processor 110 may include one or more processing units, for example: the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processing unit (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), controller, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural network processor (neural-network processing unit, NPU), etc. Wherein, different processing units may be independent devices, or may be integrated in one or more processors. The processor can generate an operation control signal according to the instruction opcode and the timing signal, and complete the control of fetching and executing the instruction.

A memory may also be provided in the processor 110 for storing instructions and data. In some embodiments, the memory in processor 110 may be a cache memory. The memory may store instructions or data used by the processor 110 or used frequently. If the processor 110 needs to use the instruction or data, it can be called directly from the memory. Repeated access is avoided, and the waiting time of the processor 110 is reduced, thus improving the efficiency of the system.

In some embodiments, processor 110 may include one or more interfaces. The interface may include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, a universal asynchronous transmitter (universal asynchronous receiver/transmitter, UART) interface, mobile industry processor interface (mobile industry processor interface, MIPI), general-purpose input and output (general-purpose input/output, GPIO) interface, subscriber identity module (subscriber identity module, SIM) interface, and /or universal serial bus (universal serial bus, USB) interface, etc. The processor 110 may be connected to modules such as a touch sensor, an audio module, a wireless communication module, a display, and a camera through at least one of the above interfaces.

It can be understood that the interface connection relationship between the modules shown in the embodiment of the present application is only a schematic illustration, and does not constitute a structural limitation of the electronic device 100 . In other embodiments of the present application, the electronic device 100 may also adopt different interface connection manners in the foregoing embodiments, or a combination of multiple interface connection manners.

The electronic device 100 may implement a display function through a GPU, a display screen 194, an application processor, and the like. The GPU is a microprocessor for image processing, and is connected to the display screen 194 and the application processor. GPUs are used to perform mathematical and geometric calculations for graphics rendering. Processor 110 may include one or more GPUs that execute program instructions to generate or change display information.

The display screen 194 is used to display images, videos and the like. The display screen 194 includes a display panel. The display panel can be a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active matrix organic light emitting diode or an active matrix organic light emitting diode (active-matrix organic light emitting diode, AMOLED), flexible light-emitting diode (flex light-emitting diode, FLED), Miniled, MicroLed, Micro-oLed, quantum dot light emitting diodes (quantum dot light emitting diodes, QLED), etc. In some embodiments, the electronic device 100 may include one or more display screens 194 .

The electronic device 100 can use the camera 193, ISP, video codec, GPU, display screen 194, application processor AP, neural network processor NPU, etc. to realize functions such as photographing and video recording, that is, related functions such as image and video collection.

The camera 193 can be used to collect color image data of the subject. In some embodiments, the camera 193 can also be used to collect depth data of the subject. That is to say, the camera in the electronic device 100 may be a common camera that does not collect depth data, such as a monocular camera, or a professional camera capable of collecting depth data, such as a binocular camera or a depth camera. The present application does not limit the specific type of the camera 193 .

The ISP can be used to process the color image data collected by the camera 193 . For example, when taking a picture, open the shutter, the light is transmitted to the photosensitive element of the camera through the lens, and the light signal is converted into an electrical signal, and the photosensitive element of the camera transmits the electrical signal to the ISP for processing, and converts it into an image visible to the naked eye. ISP can also perform algorithm optimization on image noise, brightness, and skin color. ISP can also optimize the exposure, color temperature and other parameters of the shooting scene.

In some embodiments, the electronic device 100 may include one or more cameras 193 . Specifically, the electronic device 100 may include one front camera and at least one rear camera. Wherein, the front camera can usually be used to collect the color image data of the photographer facing the display screen 194, and the rear camera can be used to collect the color image data of the object (such as people, scenery, etc.) facing the photographer.

In some embodiments, the CPU, GPU, or NPU in the processor 110 may process multiple video frames collected by the camera 193 . Specifically, the processor 110 can detect a plurality of video frames collected by the image acquisition part (i.e., the camera 193) of the electronic device 100, and determine the ground information in the plurality of video frames; perform face detection on the plurality of video frames at the same time, Determine the face area, and determine the first face pose of the target object in multiple video frames according to the face area and the preset 3D model of the face; then according to the ground information, the first face pose and the electronic device The device pose of , determine the first height of the target object. In some embodiments, the first height of the target object may also be displayed on the display screen 194 of the electronic device 100 .

The gyro sensor 180B in the electronic device 100 can be used to determine the motion posture of the electronic device 100 . In some embodiments, the angular velocity of the electronic device 100 around three axes (ie, x, y and z axes) can be determined by the gyro sensor 180B. The gyro sensor 180B can be used for image stabilization. Exemplarily, when the shutter is pressed, the gyro sensor 180B detects the shake angle of the electronic device 100, calculates the distance that the lens module needs to compensate according to the angle, and controls the reverse movement of the lens to offset the shake of the electronic device 100 to achieve anti-shake. The gyroscope sensor 180B can also be used in scenarios such as navigation and somatosensory games.

The acceleration sensor 180E in the electronic device 100 can detect the acceleration of the electronic device 100 in various directions (generally three axes x, y and z). When the electronic device 100 is stationary, the magnitude and direction of gravity can be detected. It can also be used to identify the posture of electronic devices, and can be used in applications such as horizontal and vertical screen switching, pedometers, etc.

In some embodiments, components such as the gyro sensor 180B and the acceleration sensor 180E of the electronic device 100 may constitute an inertial measurement unit (IMU) for measuring the device pose of the electronic device 100 .

The touch sensor 180K in the electronic device 100 is also called a "touch device". The touch sensor 180K can be disposed on the display screen 194, and the touch sensor 180K and the display screen 194 form a touch screen, also called a “touch screen”. The touch sensor 180K is used to detect a touch operation on or near it. The touch sensor can pass the detected touch operation to the application processor to determine the type of touch event. Visual output related to the touch operation can be provided through the display screen 194 . In other embodiments, the touch sensor 180K may also be disposed on the surface of the electronic device 100 , which is different from the position of the display screen 194 .

The keys 190 in the electronic device 100 may include a power key, a volume key and the like. The key 190 can be a mechanical key or a touch key. The electronic device 100 can receive key input and generate key signal input related to user settings and function control of the electronic device 100 . For example, when taking a picture or video through a camera application (application, APP) of the electronic device 100, the camera APP may provide buttons such as start and end of the picture or video, so that the user can operate.

The motor 191 in the electronic device 100 can generate a vibration alert. The motor 191 can be used for incoming call vibration prompts, and can also be used for touch vibration feedback. For example, touch operations applied to different applications (such as taking pictures, playing audio, etc.) may correspond to different vibration feedback effects. The motor 191 may also correspond to different vibration feedback effects for touch operations acting on different areas of the display screen 194 . Different application scenarios (for example: time reminder, receiving information, alarm clock, games, taking pictures, video recording, etc.) can also correspond to different vibration feedback effects. The touch vibration feedback effect can also support customization.

The software system of the electronic device 100 may adopt a layered architecture, an event-driven architecture, a micro-kernel architecture, a micro-service architecture, or a cloud architecture. The embodiment of the present application takes the Android system with a layered architecture as an example to illustrate the software structure of the electronic device 100 .

FIG. 2 shows a block diagram of the software structure of the electronic device 100 according to an embodiment of the present application.

The layered architecture divides the software into several layers, and each layer has a clear role and division of labor. Layers communicate through software interfaces. In some embodiments, the Android system is divided into five layers, which are application program layer, application program framework layer, Android runtime (Android runtime, ART) and native C/C++ library, hardware abstraction layer (Hardware Abstract Layer, HAL) and the kernel layer.

The application layer can consist of a series of application packages.

As shown in Figure 2, the application package may include applications such as camera, gallery, calendar, call, map, navigation, WLAN, Bluetooth, music, video, and short message.

The application framework layer provides an application programming interface (application programming interface, API) and a programming framework for applications in the application layer. The application framework layer includes some predefined functions.

As shown in Figure 2, the application framework layer may include window managers, content providers, view systems, resource managers, notification managers, activity managers, input managers, and so on.

The window manager provides window management service (Window Manager Service, WMS). WMS can be used for window management, window animation management, surface management and as a transfer station for input systems.

Content providers are used to store and retrieve data and make it accessible to applications. This data can include videos, images, audio, calls made and received, browsing history and bookmarks, phonebook, etc.

The view system includes visual controls, such as controls for displaying text, controls for displaying pictures, and so on. The view system can be used to build applications. A display interface can consist of one or more views. For example, a display interface including a text message notification icon may include a view for displaying text and a view for displaying pictures.

The resource manager provides various resources for the application, such as localized strings, icons, pictures, layout files, video files, and so on.

The notification manager enables the application to display notification information in the status bar, which can be used to convey notification-type messages, and can automatically disappear after a short stay without user interaction. For example, the notification manager is used to notify the download completion, message reminder, etc. The notification manager can also be a notification that appears on the top status bar of the system in the form of a chart or scroll bar text, such as a notification of an application running in the background, or a notification that appears on the screen in the form of a dialog window. For example, prompting text information in the status bar, issuing a prompt sound, vibrating the electronic device, and flashing the indicator light, etc.

The activity manager can provide activity management service (Activity Manager Service, AMS), AMS can be used for system components (such as activities, services, content providers, broadcast receivers) to start, switch, schedule, and manage and schedule application processes .

The input manager can provide input management service (Input Manager Service, IMS), and IMS can be used to manage the input of the system, such as touch screen input, key input, sensor input, etc. IMS fetches events from input device nodes, and distributes events to appropriate windows through interaction with WMS.

The Android runtime includes the core library and the Android runtime. The Android runtime is responsible for converting source code into machine code. The Android runtime mainly includes the use of ahead of time (ahead or time, AOT) compilation technology and just in time (just in time, JIT) compilation technology.

The core library is mainly used to provide basic Java class library functions, such as basic data structure, mathematics, IO, tools, database, network and other libraries. The core library provides APIs for users to develop Android applications.

A native C/C++ library can include multiple functional modules. For example: surface manager (surface manager), media framework (Media Framework), libc, OpenGL ES, SQLite, Webkit, etc. Among them, the surface manager is used to manage the display subsystem, and provides the fusion of 2D and 3D layers for multiple applications. The media framework supports playback and recording of various commonly used audio and video formats, as well as still image files. The media library can support a variety of audio and video encoding formats, such as: MPEG4, H.264, MP3, AAC, AMR, JPG, PNG, etc. OpenGL ES provides the drawing and manipulation of 2D graphics and 3D graphics in applications. SQLite provides a lightweight relational database for applications of the electronic device 100 .

The hardware abstraction layer runs in user space, encapsulates the kernel layer driver, and provides a call interface to the upper layer.

The kernel layer is the layer between hardware and software. The kernel layer includes at least a display driver, a camera driver, an audio driver, and a sensor driver.

The workflow of the software and hardware of the electronic device 100 will be exemplarily described below in conjunction with the height detection scenario of the embodiment of the present application.

Assuming that the height detection is realized through the application height APP on the electronic device, the user can touch the height APP icon on the display screen of the electronic device when the height detection is performed, and when the touch sensor 180K receives the touch operation, the corresponding hardware interrupt is sent to the kernel layer. The kernel layer processes touch operations into original input events (including touch coordinates, time stamps of touch operations, and other information). Raw input events are stored at the kernel level. The application framework layer obtains the original input event from the kernel layer, and identifies the control corresponding to the input event. Take the touch operation as a touch-click operation, and the control corresponding to the click operation is the control of the Height APP icon as an example. The Height APP calls the interface of the application framework layer to start the Height APP, and then starts the camera driver by calling the kernel layer. The camera 193 collects multiple video frames, that is, collects video streams through the camera 193 . Wherein, multiple video frames may include the target object whose height is to be detected.

After collecting a plurality of video frames, the electronic device 100 may perform related processing such as ground detection and face detection on the plurality of video frames through the processor 110, so as to determine the height of the target object.

Fig. 3 shows a flowchart of a height detection method according to an embodiment of the present application. As shown in FIG. 3 , the height detection method includes: step S310 , performing semantic plane detection on multiple video frames collected by the image acquisition component of the electronic device, and determining ground information in the multiple video frames.

Wherein, the image acquisition component may be a camera of an electronic device, and the camera may be an ordinary camera that does not collect depth data, such as a monocular camera, or a professional camera capable of collecting depth data, such as a binocular camera, a depth camera, etc.

In the case that the image acquisition component is an ordinary camera, the multiple video frames collected by the image capture component are color (red green blue, RGB) video frames. Since multiple video frames can form a video stream, it can also be considered that the image capture component captures is an RGB video stream. In the case where the image acquisition component is a professional camera, the multiple video frames captured by the image acquisition component may include depth data in addition to RGB image data. It should be noted that, the present application does not limit the specific type of the image acquisition component.

Multiple video frames (i.e., RGB video streams) can be collected by the image acquisition part of the electronic device, and plane detection, semantic segmentation and other processing can be performed on the multiple collected video frames to determine the ground information in the multiple video frames. Wherein, the ground information may be expressed by a plane equation in space, or by other means, which is not limited in this application.

In a possible implementation manner, when determining ground information in multiple video frames, plane detection may first be performed on the multiple video frames collected by the image acquisition component, and position information of multiple planes in the multiple video frames may be determined. Optionally, position information of multiple planes in multiple video frames may be determined by using a SLAM technology.

For example, three-dimensional information can be extracted from multiple video frames collected by the image acquisition component to obtain sparse point cloud data, and at the same time determine the device pose when the electronic device collects each video frame; and then according to the device when the electronic device collects each video frame Pose, through the plane fitting algorithm, performs plane fitting on the sparse point cloud data, and obtains the position information of multiple planes in multiple video frames. Wherein, the position information of each plane can be expressed by the plane equation in space.

Sparse point cloud data is obtained by extracting the three-dimensional information in multiple video frames, and according to the device pose when the electronic device collects each video frame, plane fitting is performed on the sparse point cloud data to obtain multiple The position information of the plane can not only improve the processing efficiency, but also can improve the accuracy of the position information of each plane.

While determining the position information of multiple planes in the multiple video frames, semantic segmentation can also be performed on the multiple video frames collected by the image acquisition component to obtain the semantic segmentation results of each video frame. Specifically, for any video frame among multiple video frames, semantic recognition can be performed on the video frame to identify the category of the object in the video frame, such as ground, table, wall, etc., and according to the category of the identified object , mark each pixel in the video frame, and obtain the semantic segmentation result of the video frame.

Then, according to the position information of multiple planes in multiple video frames and the semantic segmentation results of each video frame, semantic recognition can be performed on multiple planes in multiple video frames to obtain multiple semantic plane information, such as desktops and walls. , ground and other plane information, and then select the ground information from multiple semantic plane information.

Fig. 4 shows a schematic diagram of a detection process of ground information according to an embodiment of the present application. As shown in Figure 4, it is assumed that the height detection method of the embodiment of the present application is realized by the application program height APP on the electronic device (such as a mobile phone). Perform image acquisition to obtain multiple video frames 410 (i.e. RGB video streams); then perform three-dimensional information extraction on multiple video frames 410 to obtain sparse point cloud data 420, and determine the device position when the electronic device collects multiple video frames 410 pose 430; and according to the device pose 430 when the electronic device collects a plurality of video frames 410, through a plane fitting algorithm, the sparse point cloud data 420 is subjected to plane fitting to obtain the positions of multiple planes in the multiple video frames 410 Information 440.

While determining the position information 440 of multiple planes in multiple video frames 410, semantic segmentation 450 can also be performed on multiple video frames 410 to obtain a semantic segmentation result 460; then, according to the position information 440 of multiple planes and the semantic Segmentation result 460, perform semantic recognition on multiple planes in multiple video frames 410, obtain multiple semantic plane information 470, and select ground information 480 from multiple semantic plane information 470, wherein, ground information 480 can be passed represented by plane equations in space.

The detection process of the ground information is exemplarily described above only by taking multiple video frames (ie, RGB video streams) collected by the image collection component as input. In some embodiments, information such as the depth data collected by the image collection component and the device pose of the electronic device collected by the inertial measurement unit IMU can also be used as input at the same time, so as to improve the accuracy of ground detection.

Through the plane detection and semantic segmentation of multiple video frames collected by the image acquisition component, the electronic device can automatically perceive the captured scene and obtain multiple semantic plane information, and then automatically recognize the ground information, which can not only avoid manual operations by users, such as Manual operations such as manual selection of the ground by the user can improve the accuracy of ground detection.

In a possible implementation manner, if the ground information in multiple video frames is not detected within a preset period of time (for example, 5s, 10s, etc.), the user may be prompted to take pictures of the ground. For example, the prompt information such as "please shoot the ground" and "the ground is not detected" can be broadcast to the user through voice broadcast, or the display interface of the height APP can be used to show the user "please shoot the ground" through text, animation, etc. ground", "ground not detected" and other prompt information, so that users can adjust the shooting content in time, thereby improving the efficiency of ground detection.

It should be noted that those skilled in the art can set the content and prompting method of the prompt information when ground information in multiple video frames is not detected according to the actual situation, which is not limited in this application.

Step S320, performing face detection on the plurality of video frames to determine the face area. While determining ground information in multiple video frames, face detection can be performed on multiple video frames by means of feature extraction, key point detection, and the like. When a complete human face is detected, a human face area can be determined from multiple video frames, and an object corresponding to the human face area can be determined as a target object. Wherein, there may be one or more face regions, and one or more target objects, which are not limited in this application.

In a possible implementation, after the face area is determined from a plurality of video frames, it can be judged whether the face area satisfies a third preset condition, wherein the third preset condition is that the face area is located in the video frame where it is located. In the preset area of the frame, the preset area can be the central area of the video frame where the face area is located. The area of one is set as the central area of the video frame where the face area is located. It should be noted that those skilled in the art can set the preset area of the video frame where the face area is located according to the actual situation, and this application does not limit this.

When the face area does not meet the third preset condition, the user may be prompted to adjust the device pose of the electronic device through voice broadcast, text display, animation display, etc., so that the face area meets the third preset condition, Therefore, the face area is located within the preset area of the video frame where it is located, so as to improve the accuracy of height detection.

Step S330: Determine the first face pose of the target object in the plurality of video frames according to the face area and the preset three-dimensional model of the face. When determining the first face pose of the target object in multiple video frames, the target object can be established through the pre-trained neural network according to the face area and the preset 3D model of the face (ie, the average face 3D model). 3D model of human face. For example, the face area and the preset 3D face model can be input into a pre-trained convolutional neural network (CNN) for registration to obtain a 3D face model of the target object.

Then according to the parameters of the preset three-dimensional model of the human face, for example, the constraints on the structure of the human face: pupillary distance, distance from the tip of the nose to the top of the head, etc., determine the relative distance between the three-dimensional human face model of the target object and the image acquisition component of the electronic device. position information and rotation information, and determine the position information and rotation information of the three-dimensional face model of the target object relative to the image acquisition component of the electronic device as the first face pose of the target object. Wherein, the rotation information may be represented by pitch angle, roll angle and yaw angle.

Fig. 5 shows a schematic diagram of a process of determining a first face pose of a target object according to an embodiment of the present application. As shown in Figure 5, face detection 520 can be performed on a plurality of video frames 510 collected by the image acquisition part of the electronic device to determine the face area 530; the face area 530 and the preset three-dimensional model 540 of the face are input into the pre-training The convolutional neural network CNN 550 of the target object is registered to obtain the three-dimensional face model 560 of the target object; according to the parameters of the three-dimensional face model 540 preset, determine the three-dimensional face model of the target object relative to the image acquisition part of the electronic device position information and rotation information, and determine the position information and rotation information as the first face pose 570 of the target object.

In this way, the 3D face model of the target object can be determined according to the face area and the preset 3D face model, and then the first face pose of the target object can be determined according to the parameters of the preset 3D face model , so that the face 3D reconstruction technology can be used to determine the first face pose of the target object, which can not only improve the processing efficiency, but also improve the accuracy of the first face pose of the target object, thereby improving the accuracy of height detection .

In a possible implementation, the neural network (used to generate the 3D face model of the target object, such as the convolutional neural network CNN 550) can be pre-configured according to a plurality of sample face regions and a preset 3D model of the face. to train. For example, for any sample face area, the sample face area and the preset three-dimensional model of the face can be input into the neural network for registration to obtain the three-dimensional model of the sample face; then the three-dimensional model of the sample face is reversed Rendering (reverse render), which is to project the 3D model of the sample face into a 2D space to obtain a reverse rendering image; determine the network loss of the neural network according to the difference between each reverse rendering image and the corresponding sample face area; The network loss adjusts the network parameters of the neural network.

When the neural network meets the preset training end condition, the training ends and the trained neural network is obtained. Wherein, the training end condition may be, for example, that the training rounds of the neural network reach a preset training round threshold, the network loss of the neural network converges within a certain range, and the neural network passes verification on the verification set, etc. Those skilled in the art can set the training end condition of the neural network according to the actual situation, which is not limited in this application.

In a possible implementation, after determining the first face pose of the target object, it can be judged whether the first face pose satisfies the second preset condition, and the second preset condition is that in the first face pose The pitch angle of the first face pose is in the preset second angle interval, the roll angle in the first face pose is in the preset third angle interval, and the yaw angle in the first face pose is in the preset fourth angle interval. within the angle range. Wherein, the second angle interval, the third angle interval and the fourth angle interval may be the same or different. It should be noted that those skilled in the art can set the specific values of the second angle interval, the third angle interval and the fourth angle interval according to the actual situation, which is not limited in the present application.

When the first face pose of the target object does not meet the second preset condition, the user may be prompted to adjust the device pose of the electronic device and/or change the face pose of the target object through voice broadcast, text display, animation display, etc. Face pose, so that the first face pose of the target object satisfies the second preset condition, so that the face of the target object faces the image acquisition component of the electronic device, that is, the video captured by the image acquisition component can The face in the frame is the frontal face of the target object to improve the accuracy of height detection.

Step S340: Determine a first height of the target object according to the ground information, the first face pose, and the device pose of the electronic device. Wherein, the device pose of the electronic device is the pose when the electronic device captures the video frame where the face area is located.

In a possible implementation manner, before determining the first height of the target object, it may be determined whether the pitch angle of the electronic device indicated by the device pose of the electronic device satisfies a first preset condition. The first preset condition is that the electronic device The pitch angle of the electronic device indicated in the device pose is within the preset first angle range.

When the pitch angle of the electronic device indicated by the device pose of the electronic device does not meet the first preset condition, the user may be prompted to adjust the device pose of the electronic device through voice broadcast, text display, animation display, etc., to avoid Excessive upward or downward shooting during video frame acquisition, thereby improving the accuracy of height detection.

When determining the first height of the target object, the ground information and the coordinate system of the first face pose of the target object can be determined first. In the case of SLAM technology, ground information is located in world coordinates. According to the data collected by the inertial measurement unit IMU, the Y axis of the world coordinate system can be determined as the vertical direction of the real world. Since the physical information such as distance and object size in the world coordinate system is the same as the real world, in this way, The connection between the virtual world coordinate system and the real world can be established, so that the size of the object calculated in the world coordinate system is the actual size of the object in the real world.

The first face pose of the target object is the face pose of the target object relative to the image acquisition component of the electronic device, which is located in the camera coordinate system. In the camera coordinate system, the image acquisition part of the electronic device is located at the origin of the coordinates. From the perspective of the 3D face model of the target object, the position of the image acquisition part of the electronic device is fixed. There is no size comparison of objects in the world, and there is a certain scaling ratio between the face size of the target object in the camera coordinate system and the actual face size of the target object in the real world. Therefore, face size adjustment and coordinate system transformation are required.

In a possible implementation, when determining the first height of the target object, the first face pose of the target object can be adjusted according to the preset interpupillary distance reference value to obtain the second face pose of the target object. pose, wherein the face size indicated by the first face pose is the same as the face size of the target object in the face area, the face size indicated by the second face pose is the actual face size of the target object, and the second The face pose is located in the camera coordinate system. That is to say, the first face pose of the target object can be adjusted in the camera coordinate system, so that the face size indicated by the adjusted second face pose is the actual face size of the target object.

For example, the interpupillary distance value in the first human face pose of the target object can be determined, and the face size transformation coefficient can be determined according to the preset interpupillary distance reference value and the interpupillary distance value in the first human face pose; The face size conversion coefficient is used to adjust the first face pose to obtain the second face pose of the target object.

Determine the face size transformation coefficient based on the interpupillary distance reference value and the interpupillary distance value in the first face pose, and adjust the first face pose according to the face size transformation coefficient to obtain the second face size of the target object. Face pose, so that the actual size and pose of the face of the target object in the camera coordinate system can be obtained.

After obtaining the second face pose of the target object, coordinate transformation can be performed on the second face pose according to the device pose of the electronic device to obtain the third face pose of the target object, wherein the third face pose of the target object The face pose is located in the world coordinate system.

In a possible implementation, the coordinate transformation of the second face pose P _C can be performed by the following formula (1) to obtain the third face pose P _w of the target object:

P _w ＝ T ^-1 P _C (1)

In formula (1), T represents the rigid body transformation matrix determined according to the device pose (R, t) of the electronic device,

Wherein, R represents the rotation matrix in the device pose of the electronic device, and t represents the translation matrix in the device pose of the electronic device.

Then, the first height of the target object can be determined according to the third face pose and ground information. In a possible implementation manner, the position of the top of the target object may be determined according to the pose of the third face, and then the first height of the target object may be determined according to the position of the top of the target object and ground information. For example, assuming that the head position of the target object determined according to the pose of the third face is (x ₁ , y ₁ , z ₁ ), the ground information is represented by the plane equation F=f(x, y, z) in space, which can be Calculate the first distance L ₁ from (x ₁ , y ₁ , z ₁ ) to the ground F=f(x, y, z) in the direction of the Y axis, and determine the first distance L ₁ as the first distance of the target object Height L, ie L=L ₁ .

By determining the position of the top of the target object, and determining the first height of the target object according to the position of the top of the head and ground information, it is simple and fast and can improve the accuracy of height detection.

In a possible implementation, the position of the tip of the target's nose can be determined according to the pose of the third face, and then according to the position of the tip of the target's nose and the preset ratio between the tip of the nose to the chin and the tip of the nose to the top of the head, determine The head position of the target object, and the first height of the target object is determined according to the head position of the target object and ground information.

For example, assuming that the nose tip position of the target object determined according to the third face pose is (x ₂ , y ₂ , z ₂ ), the ground information is represented by the plane equation F=f(x, y, z) in space, which can be According to the nose tip position (x ₂ , y ₂ , z ₂ ) of the target object and the preset proportional relationship between the nose tip to the chin and the nose tip to the top of the head, determine the head position (x ₃ , y ₃ , z ₃ ) of the target object, Then calculate the second distance L ₂ from (x ₃ , y ₃ , z ₃ ) to the ground F=f(x, y, z) in the direction of the Y axis, and determine the second distance L ₂ as the first distance of the target object A person is L in height, that is, L=L ₂ .

Determine the target's head position through the target's nose tip position and the proportional relationship between the nose tip to the chin and the nose tip to the top of the head, and determine the target's first height based on the target's head position and ground information, so as to increase the height detection accuracy.

In a possible implementation, when there are multiple human face regions corresponding to the target object, for any human face region, in a manner similar to the above, according to the ground information in multiple video frames, the target The first face pose of the object and the device pose when the electronic device shoots the video frame where the face area is located, determine the second height of the target object; then perform Kalman filtering and averaging on multiple second heights, etc. After processing, the first height of the target object is obtained. In this way, the accuracy of height detection can be improved.

In a possible implementation manner, after the first height of the target object is determined, the first height of the target object may also be displayed on a display interface of the electronic device. For example, after the first height of the target object is determined, the first height of the target object may be displayed on the display interface of the electronic device through animation, text, augmented reality (augmented reality, AR) and other means. When the height detection is implemented through the height APP, the first height of the target object can be displayed on the display interface of the height APP. The display interface of the height APP may include a real-time image interface of video frames collected by the image acquisition part of the electronic device.

Fig. 6 shows a schematic diagram of displaying the height of a target object according to an embodiment of the present application. As shown in Figure 6, the user detects the height of the target object 630 through the height APP on the electronic device 600, and the display interface 610 of the height APP displays the video frames collected in real time by the image acquisition component (not shown) of the electronic device 600, When the height APP detects the height of the target object 630, its height can be displayed at the preset position above the head of the target object 630 in the display interface 610 of the height APP through the augmented reality icon 620, and the display information can be " Height: 175CM".

The above-mentioned FIG. 6 only uses one target object as an example to illustrate the manner of displaying the height. It should be noted that the heights of multiple target objects may also be displayed in the above manner. Those skilled in the art can also set the display manner and display position of the height of the target object according to the actual situation, which is not limited in the present application.

According to the height detection method of the embodiment of the present application, semantic plane detection can be performed on multiple video frames collected by the image acquisition part of the electronic device, and ground information in multiple video frames can be determined; face detection can be performed on multiple video frames at the same time, Determine the face area, and determine the first face pose of the target object in multiple video frames according to the face area and the preset three-dimensional model of the face; then, according to the ground information, the first face pose and the electronic device The device pose determines the first height of the target object, so that height detection not only does not depend on professional equipment (such as binocular cameras, depth cameras, etc.), but also can determine the face of the target object through face recognition and face 3D technology pose, and then determine the height of the target object through face pose, equipment pose and ground information, without manually positioning the target object, and without taking a complete human body image of the target object, which is easy to operate and highly accurate.

Fig. 7 shows a schematic diagram of a processing procedure of height detection according to an embodiment of the present application. As shown in Figure 7, assume that the user detects the height of the target object through the height APP running on the electronic device. When the user opens the height APP, step S701 is executed, and the height APP collects multiple videos through the image acquisition part of the electronic device. frame (i.e. video stream). Optionally, during the height detection process, the image acquisition component may continuously acquire video streams.

In the case that the semantic plane detection is realized by SLAM technology, in step S702, it can be judged whether the SLAM initialization is successful, if the SLAM is not initialized successfully, the user is prompted to move the electronic device, and step S701 is re-executed at the same time; if the SLAM initialization is successful, then execute Step S703, perform semantic plane detection on a plurality of video frames, and in step S704, judge whether ground information is detected within a preset time period.

If the ground information is not detected within the preset time period, the user is prompted to take pictures of the ground, and at the same time continue to execute step S701; The human face area, and in step S706, it is judged whether the human face area satisfies the third preset condition, the third preset condition is that the human face area is located in the preset area of the video frame where it is located.

If the face area does not meet the third preset condition, prompt the user to adjust the device pose of the electronic device, and re-execute step S701; if the face area meets the third preset condition, execute step S707, The three-dimensional model of the human face is determined to determine the first human face pose of the target object in a plurality of video frames, and in step S708, it is judged whether the first human face pose satisfies the second preset condition, and the second preset condition is The pitch angle in the first face pose is within the preset second angle interval, the roll angle in the first face pose is within the preset third angle interval, and the yaw in the first face pose The angle is within the preset fourth angle range.

If the first face pose does not meet the second preset condition, prompt the user to adjust the device pose of the electronic device and/or change the face pose of the target object, and re-execute step S701; In the case of the second preset condition, step S709 is executed to determine whether the pitch angle of the electronic device indicated in the device pose of the electronic device satisfies the first preset condition. The first preset condition is that the pitch angle indicated in the device pose of the electronic device The pitch angle of the electronic device is within the preset first angle range.

If the pitch angle of the electronic device indicated in the device pose of the electronic device does not meet the first preset condition, prompt the user to adjust the device pose of the electronic device, and re-execute step S701; The pitch angle of the device meets the first preset condition, execute step S710, determine the first height of the target object according to the ground information, the first face pose and the device pose of the electronic device; then execute step S711, through the augmented reality AR In other ways, the first height of the target object is displayed on the display interface of the height APP.

The height detection method of the embodiment of the present application can automatically identify the ground information and automatically detect the height of the target object through SLAM technology and semantic segmentation, without manual operation (such as manual clicking operation or marking the target object), and can simultaneously The height of multiple target objects is detected, thereby simplifying the height detection process and improving the height detection efficiency. In addition, the embodiment of the present application acquires three-dimensional information through SLAM technology, which can also avoid contact between the electronic device and the human body of the target object, which is safe and reliable.

The height detection method of the embodiment of the present application performs height detection based on multiple video frames collected by a common camera (such as a monocular camera), without the need for professional equipment such as a depth camera, which reduces equipment dependence. In some embodiments, the user passes Handheld devices (such as mobile phones, smart watches, etc.) can realize height detection. At the same time, the embodiment of the present application obtains the face pose of the target object through face recognition and face three-dimensional reconstruction technology, which is fast and accurate, not only can improve the accuracy of height detection, but also is suitable for target object movement, shooting angle changes, etc. Scenes.

Fig. 8 shows a block diagram of a height detection device according to an embodiment of the present application. As shown in Figure 8, the height detection device is applied to electronic equipment, and the height detection device includes: an image acquisition component 810, configured to capture multiple video frames; a processing component 820, configured to: Carry out semantic plane detection on the frame to determine the ground information in the plurality of video frames; perform face detection to the plurality of video frames to determine the face area; according to the face image of the face area and the preset person A three-dimensional face model, determining the first face pose of the target object in the plurality of video frames; determining the target according to the ground information, the first face pose, and the device pose of the electronic device The first height of the object.

In a possible implementation manner, the processing component is further configured to at least one of the following: if the ground information is not detected within a preset period of time, prompt the user to take pictures of the ground; When the pitch angle of the electronic device does not meet the first preset condition, the user is prompted to adjust the pose of the device; when the first human face pose does not meet the second preset condition, the user is prompted Adjusting the pose of the device and/or changing the face pose of the target object; or prompting the user to adjust the pose of the device when the face area does not meet a third preset condition.

In a possible implementation manner, the determining the first height of the target object according to the ground information, the first face pose, and the device pose of the electronic device includes: according to the The ground information, the first human face pose and the device pose determine the second height of the target object; perform post-processing on the second height to obtain the first height, and the post-processing includes Kalman filter.

In a possible implementation manner, the processing component is further configured to: display the first height on a display interface of the electronic device.

An embodiment of the present application provides a height detection device, including: an image acquisition component for acquiring multiple video frames; a processor and a memory for storing processor-executable instructions; wherein the processor is configured to The above method is implemented when the instructions are executed.

An embodiment of the present application provides a non-volatile computer-readable storage medium, on which computer program instructions are stored, and when the computer program instructions are executed by a processor, the foregoing method is realized.

An embodiment of the present application provides a computer program product, including computer-readable codes, or a non-volatile computer-readable storage medium bearing computer-readable codes, when the computer-readable codes are stored in a processor of an electronic device When running in the electronic device, the processor in the electronic device executes the above method.

A computer readable storage medium may be a tangible device that can retain and store instructions for use by an instruction execution device. A computer readable storage medium may be, for example, but is not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (non-exhaustive list) of computer-readable storage media include: portable computer disk, hard disk, random access memory (Random Access Memory, RAM), read only memory (Read Only Memory, ROM), erasable Electrically Programmable Read-Only-Memory (EPROM or flash memory), Static Random-Access Memory (Static Random-Access Memory, SRAM), Portable Compression Disk Read-Only Memory (Compact Disc Read-Only Memory, CD -ROM), Digital Video Disc (DVD), memory sticks, floppy disks, mechanically encoded devices such as punched cards or raised structures in grooves with instructions stored thereon, and any suitable combination of the foregoing .

Computer readable program instructions or codes described herein may be downloaded from a computer readable storage medium to a respective computing/processing device, or downloaded to an external computer or external storage device over a network, such as the Internet, local area network, wide area network, and/or wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers. A network adapter card or a network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in each computing/processing device .

Computer program instructions for performing the operations of the present application may be assembly instructions, instruction set architecture (Instruction Set Architecture, ISA) instructions, machine instructions, machine-related instructions, microcode, firmware instructions, state setting data, or in one or more source or object code written in any combination of programming languages, including object-oriented programming languages—such as Smalltalk, C++, etc., and conventional procedural programming languages—such as the “C” language or similar programming languages. Computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server implement. In cases involving a remote computer, the remote computer can be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or it can be connected to an external computer such as use an Internet service provider to connect via the Internet). In some embodiments, electronic circuits, such as programmable logic circuits, field-programmable gate arrays (Field-Programmable Gate Array, FPGA) or programmable logic arrays (Programmable Logic Array, PLA), the electronic circuit can execute computer-readable program instructions, thereby realizing various aspects of the present application.

Aspects of the present application are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It should be understood that each block of the flowcharts and/or block diagrams, and combinations of blocks in the flowcharts and/or block diagrams, can be implemented by computer-readable program instructions.

These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine such that when executed by the processor of the computer or other programmable data processing apparatus , producing an apparatus for realizing the functions/actions specified in one or more blocks in the flowchart and/or block diagram. These computer-readable program instructions can also be stored in a computer-readable storage medium, and these instructions cause computers, programmable data processing devices and/or other devices to work in a specific way, so that the computer-readable medium storing instructions includes An article of manufacture comprising instructions for implementing various aspects of the functions/acts specified in one or more blocks in flowcharts and/or block diagrams.

It is also possible to load computer-readable program instructions into a computer, other programmable data processing device, or other equipment, so that a series of operational steps are performed on the computer, other programmable data processing device, or other equipment to produce a computer-implemented process , so that instructions executed on computers, other programmable data processing devices, or other devices implement the functions/actions specified in one or more blocks in the flowcharts and/or block diagrams.

The flowchart and block diagrams in the figures show the architecture, functions and operations of possible implementations of apparatuses, systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in a flowchart or block diagram may represent a module, a portion of a program segment, or an instruction that includes one or more Executable instructions. In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved.

It should also be noted that each block in the block diagrams and/or flowcharts, and combinations of blocks in the block diagrams and/or flowcharts, can be implemented with hardware (such as circuits or ASIC (Application Specific Integrated Circuit, application-specific integrated circuit)), or can be implemented with a combination of hardware and software, such as firmware.

Although the present invention has been described in conjunction with various embodiments herein, in the process of implementing the claimed invention, those skilled in the art can understand and Other variations of the disclosed embodiments are implemented. In the claims, the word "comprising" does not exclude other components or steps, and "a" or "an" does not exclude a plurality. A single processor or other unit may fulfill the functions of several items recited in the claims. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that these measures cannot be combined to advantage.

Having described various embodiments of the present application above, the foregoing description is exemplary, not exhaustive, and is not limited to the disclosed embodiments. Many modifications and alterations will be apparent to those of ordinary skill in the art without departing from the scope of the described embodiments. The terminology used herein is chosen to best explain the principle of each embodiment, practical application or improvement of technology in the market, or to enable other ordinary skilled in the art to understand each embodiment disclosed herein.

Claims

A height detection method, characterized in that, comprising:

Carrying out semantic plane detection on a plurality of video frames collected by the image acquisition part of the electronic device, and determining the ground information in the plurality of video frames;

Carry out face detection to described multiple video frames, determine face region;

According to the face area and the preset three-dimensional model of the face, determine the first face pose of the target object in the plurality of video frames;

A first height of the target object is determined according to the ground information, the first face pose, and the device pose of the electronic device.
The method according to claim 1, wherein the method further comprises at least one of the following:

If the ground information is not detected within a preset period of time, prompting the user to take pictures of the ground;

Prompting the user to adjust the device pose when the pitch angle of the electronic device indicated by the device pose does not meet a first preset condition;

If the first face pose does not meet the second preset condition, prompting the user to adjust the device pose and/or change the face pose of the target object; or

If the face area does not meet the third preset condition, the user is prompted to adjust the pose of the device.
The method according to claim 1 or 2, wherein the first height of the target object is determined according to the ground information, the first face pose and the device pose of the electronic device ,include:

determining a second height of the target object according to the ground information, the first face pose, and the device pose;

performing post-processing on the second height to obtain the first height, and the post-processing includes Kalman filtering.
The method according to any one of claims 1-3, further comprising:

Displaying the first height on a display interface of the electronic device.
A height detection device, characterized in that it is applied to electronic equipment, including:

An image acquisition component, configured to acquire a plurality of video frames;

processing component, configured to:

Carry out semantic plane detection to the plurality of video frames, and determine ground information in the plurality of video frames;

Carry out face detection to described multiple video frames, determine face region;

According to the face image of the face area and the preset three-dimensional model of the face, determine the first face pose of the target object in the plurality of video frames;

A first height of the target object is determined according to the ground information, the first face pose, and the device pose of the electronic device.
The device according to claim 5, wherein the processing component is further configured as at least one of the following:

If the ground information is not detected within a preset period of time, prompting the user to take pictures of the ground;

Prompting the user to adjust the device pose when the pitch angle of the electronic device indicated by the device pose does not meet a first preset condition;

If the first face pose does not meet the second preset condition, prompting the user to adjust the device pose and/or change the face pose of the target object; or

If the face area does not meet the third preset condition, the user is prompted to adjust the pose of the device.
The device according to claim 5 or 6, wherein the first height of the target object is determined according to the ground information, the first human face pose and the device pose of the electronic device ,include:

determining a second height of the target object according to the ground information, the first face pose, and the device pose;

performing post-processing on the second height to obtain the first height, and the post-processing includes Kalman filtering.
The device according to any one of claims 5-7, wherein the processing component is further configured to:

The first height is displayed on the display interface of the electronic device.
A height detection device, characterized in that it comprises:

An image acquisition component, configured to acquire a plurality of video frames;

processor;

memory for storing processor-executable instructions;

Wherein, the processor is configured to implement the method according to any one of claims 1-4 when executing the instructions.
A non-volatile computer-readable storage medium, on which computer program instructions are stored, wherein, when the computer program instructions are executed by a processor, the method according to any one of claims 1-4 is implemented.
A computer program product, comprising computer-readable codes, or a non-volatile computer-readable storage medium carrying computer-readable codes, when the computer-readable codes run in an electronic device, the The processor executes the method of any one of claims 1-4.