WO2022110877A1 - Procédé et appareil de détection de profondeur, dispositif électronique, support de stockage et programme - Google Patents

Procédé et appareil de détection de profondeur, dispositif électronique, support de stockage et programme Download PDF

Info

Publication number
WO2022110877A1
WO2022110877A1 PCT/CN2021/109803 CN2021109803W WO2022110877A1 WO 2022110877 A1 WO2022110877 A1 WO 2022110877A1 CN 2021109803 W CN2021109803 W CN 2021109803W WO 2022110877 A1 WO2022110877 A1 WO 2022110877A1
Authority
WO
WIPO (PCT)
Prior art keywords
human body
dimensional key
key point
image
frame image
Prior art date
Application number
PCT/CN2021/109803
Other languages
English (en)
Chinese (zh)
Inventor
李雷
李健华
王权
钱晨
Original Assignee
深圳市商汤科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN202011335257.4A external-priority patent/CN112465890B/zh
Application filed by 深圳市商汤科技有限公司 filed Critical 深圳市商汤科技有限公司
Publication of WO2022110877A1 publication Critical patent/WO2022110877A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images

Definitions

  • the present disclosure is based on the Chinese patent application with the application number of 202011344694.2 and the application date of November 24, 2020, and the application name of "depth detection method, device, electronic device and computer-readable storage medium", and the application number of 202011335257.4 and the application date of Filed for the Chinese patent application filed on November 24, 2020 with the application title of "Depth Detection Method, Apparatus, Electronic Device and Computer-readable Storage Medium", and claims the priority of the above-mentioned Chinese patent application, and the entire content of the above-mentioned Chinese patent application This disclosure is incorporated herein by reference.
  • the present disclosure relates to the field of computer vision technology, in particular, but not limited to, a depth detection method, apparatus, electronic device, storage medium and computer program.
  • image depth detection technology has important applications in Augmented Reality (AR) interaction, virtual photography and other applications; in the absence of special hardware devices such as 3D depth cameras, how to realize image human depth detection , is an urgent technical problem to be solved.
  • AR Augmented Reality
  • Embodiments of the present disclosure provide a depth detection method, apparatus, electronic device, storage medium, and computer program.
  • An embodiment of the present disclosure provides a depth detection method, the method is applied to an electronic device, and the method includes:
  • the two-dimensional key point information and three-dimensional key point information of the human body in the current frame image, and the mask image of the human body determine the depth detection result of the human body in the current frame image, wherein the human body includes a single human body Or at least two human bodies.
  • Embodiments of the present disclosure also provide a depth detection device, the device comprising:
  • the acquisition module is configured to: acquire at least one frame of image collected by the image acquisition device, where the at least one frame of image includes the current frame image;
  • the processing module is configured to: perform human body image segmentation on the current frame image to obtain a mask image of the human body; perform human body key point detection on the at least one frame image to obtain a two-dimensional human body in the current frame image Key point information and 3D key point information;
  • the detection module is configured to: determine the depth detection result of the human body in the current frame image according to the two-dimensional key point information and three-dimensional key point information of the human body in the current frame image and the mask image of the human body, wherein the The human body includes a single human body or at least two human bodies.
  • Embodiments of the present disclosure also provide an electronic device, the electronic device comprising:
  • the processor is configured to implement any one of the above depth detection methods when executing the executable instructions stored in the memory.
  • Embodiments of the present disclosure further provide a computer-readable storage medium storing executable instructions for implementing any one of the above-mentioned depth detection methods when executed by a processor.
  • Embodiments of the present disclosure further provide a computer program, the computer program includes computer-readable codes, and when the computer-readable codes are executed in an electronic device, the processor of the electronic device executes the code for realizing the The depth detection method described in any preceding item.
  • the embodiments of the present disclosure can combine the human body mask image and the two-dimensional key points and three-dimensional key information of the human body to determine the depth detection result of the human body, and it is not necessary to obtain the depth information of the human body in the image through a special hardware device such as a three-dimensional depth camera. Therefore, The embodiments of the present disclosure can realize depth detection of a human body in an image without relying on special hardware devices such as a three-dimensional depth camera, and can be applied to scenarios such as AR interaction and virtual photography.
  • FIG. 1 is a schematic diagram of connection between a terminal and a server according to an embodiment of the present disclosure
  • FIG. 2 is a schematic flowchart of a depth detection method provided by an embodiment of the present disclosure
  • FIG. 3 is a schematic diagram of a two-dimensional key point of a human skeleton provided by an embodiment of the present disclosure
  • FIG. 4A is a schematic diagram of a two-dimensional key point of a target human body provided by an embodiment of the present disclosure
  • 4B is a schematic diagram of a three-dimensional key point and a human body mask image of a target human body according to an embodiment of the present disclosure
  • FIG. 5 is a schematic structural diagram of implementing a depth detection method according to an embodiment of the present disclosure
  • FIG. 6 is a schematic diagram of a point cloud provided by an embodiment of the present disclosure.
  • FIG. 7 is a schematic structural diagram of a depth detection apparatus according to an embodiment of the present disclosure.
  • FIG. 8 is a schematic structural diagram of an electronic device provided by an embodiment of the present disclosure.
  • the terms “comprising”, “comprising” or any other variations thereof are intended to cover non-exclusive inclusion, so that a method or device including a series of elements not only includes the explicitly stated elements, but also other elements not expressly listed or inherent to the implementation of the method or apparatus.
  • an element defined by the phrase “comprises a" does not preclude the presence of additional related elements (eg, steps in a method or a device) in which the element is included.
  • a unit in an apparatus for example, a unit may be part of a circuit, part of a processor, part of a program or software, etc.).
  • the depth detection method provided by the embodiment of the present disclosure includes a series of steps, but the depth detection method provided by the embodiment of the present disclosure is not limited to the described steps.
  • the depth detection device provided by the embodiment of the present disclosure includes a A series of modules, but the apparatus provided by the embodiments of the present disclosure is not limited to including the explicitly described modules, and may also include modules that need to be set for acquiring relevant information or performing processing based on the information.
  • 3D depth camera can be used to realize the depth detection of the human body in the image.
  • the 3D depth camera here can be a camera with a binocular camera and using binocular vision technology to obtain depth information; however, using these special The hardware will increase the application cost and limit the application scenarios to a certain extent.
  • the requirements for the accuracy of depth estimation and the amount of information provided are relatively low; in the case of human depth estimation based on images captured by a monocular camera, Only the relative depth between each pixel of the human body can be estimated, but the depth between the pixel of the human body and the camera cannot be estimated, which limits the scope of application to a certain extent; in some cases, only a single pixel can be estimated for each pixel of the human body. Therefore, the estimated depth information is less; in some cases, depth information estimation can be achieved based on the image matching algorithm of consecutive frames, but this scheme increases the consumption of time resources and computing resources, and is not suitable for low power consumption. real-time application scenarios of consumption.
  • the embodiments of the present disclosure provide a depth detection method, apparatus, electronic device, storage medium, and computer program.
  • the depth detection method provided by the embodiments of the present disclosure can be used without relying on high-cost and complex hardware such as a three-dimensional depth camera.
  • the depth detection of the human body in the image is realized; the depth detection method provided by the embodiment of the present disclosure can be applied to an electronic device, and an exemplary application of the electronic device provided by the embodiment of the present disclosure is described below.
  • the electronic devices provided by the embodiments of the present disclosure may be AR glasses, laptop computers, tablet computers, desktop computers, mobile devices (eg, mobile phones, portable music players, personal digital assistants, dedicated messaging devices, portable game equipment) and other various terminals with image acquisition devices
  • the image acquisition device may be a device such as a monocular camera, for example, the terminal may be a mobile phone with a camera.
  • the terminal may perform depth detection on the image collected by the image collection device according to the depth detection method of the embodiment of the present disclosure, and obtain a depth detection result of the human body in the image.
  • the electronic device provided by the embodiments of the present disclosure may also be a server that forms a communication connection with the above-mentioned terminal.
  • FIG. 1 is a schematic diagram of connection between a terminal and a server according to an embodiment of the present disclosure. As shown in FIG. 1 , a terminal 100 is connected to a server 102 through a network 101, and the network 101 may be a wide area network or a local area network, or a combination of the two.
  • the server 102 may be an independent physical server, or a server cluster or distributed system composed of multiple physical servers, or may provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, Cloud servers for basic cloud computing services such as network services, cloud communications, middleware services, domain name services, security services, Content Delivery Network (CDN), and big data and artificial intelligence platforms.
  • the terminal and the server may be directly or indirectly connected through wired or wireless communication, which is not limited in this embodiment of the present disclosure.
  • the terminal 100 is used for collecting the image at the current moving position through the image collecting device; the collected image can be sent to the server 102; after receiving the image, the server 102 can use the depth detection method of the embodiment of the present disclosure to analyze the received image Depth detection is performed to obtain the depth detection result of the human body in the image.
  • FIG. 2 is a schematic flowchart of a depth detection method provided by an embodiment of the present disclosure, and the method is applied to an electronic device. As shown in FIG. 2, the process may include steps 201 to 203:
  • Step 201 Acquire at least one frame of image collected by the image collection device, where the at least one frame of image includes the current frame image.
  • the image capturing device may capture images, and may also send at least one frame of image including the current frame image to the processor of the electronic device.
  • At least one frame of image includes current frame image (a frame of image collected at the current moment); in some embodiments, at least one frame of image includes not only current frame image, but also historical frame image, here, historical frame image An image represents one or more frames of historical images captured by an image capture device.
  • At least one frame of image in the case where at least one frame of image is a multi-frame image, at least one frame of image may be a continuous frame image continuously collected by an image acquisition device, or may be discontinuous multiple frame images. This is not limited.
  • Step 202 segment the human body image on the current frame image to obtain a mask image of the human body; perform human body key point detection on at least one frame image to obtain two-dimensional key point information and three-dimensional key point information of the human body in the current frame image .
  • the above-mentioned human body includes at least two human bodies; correspondingly, performing the segmentation of the human body image on the current frame image to obtain the mask image of the human body may be achieved by: segmenting the human body image on the current frame image to obtain mask images of at least two human bodies; and detecting human body key points on at least one frame of images to obtain two-dimensional key point information and three-dimensional key point information of at least two human bodies in the current frame image.
  • the above-mentioned human body includes a single target human body; correspondingly, the segmentation of the human body image on the current frame image to obtain the mask image of the human body may be achieved by: segmenting the human body image on the current frame image to obtain The mask image of the target human body; and the human body key point detection is performed on at least one frame of image, and the two-dimensional key point information and the three-dimensional key point information of at least one human body in the current frame image are obtained.
  • a human body image can be segmented on the current frame image according to a pre-trained image segmentation model to obtain a human body mask image of the human body.
  • the image segmentation model may be a model related to the attributes of the human image.
  • the attributes of the human image may include area, gray value of pixels, or other attributes; in some embodiments, In the case where the attribute of the human body is the area, the human body image is segmented on the current frame image according to the pre-trained image segmentation model, and the mask image of the human body with the area larger than the set area can be obtained.
  • the image segmentation model may be implemented by a neural network, for example, the image segmentation model may be implemented by a fully convolutional neural network or other neural networks.
  • the image segmentation model can be predetermined according to actual requirements, and the actual requirements include but are not limited to time-consuming requirements, precision requirements, etc.; that is, different image segmentation models can be set according to different actual requirements.
  • the image segmentation model is an image segmentation model of at least two human bodies.
  • the image segmentation model is an image segmentation model of a single human body.
  • the human body mask of the target human body can be obtained.
  • the human body mask image of the target human body can be directly obtained, which has the characteristics of easy implementation.
  • the human body mask image of the target human body can be segmented from the current frame image by using an image segmentation model of a single human body.
  • the current frame image is segmented into a single human body image, and the target human body representing the human body with the largest area can be obtained.
  • Body mask image when the attribute of the human body is area, according to a pre-trained image segmentation model of a single human body, the current frame image is segmented into a single human body image, and the target human body representing the human body with the largest area can be obtained. Body mask image.
  • the two-dimensional key points are used to represent the key position points of the human body in the image plane;
  • the two-dimensional key point information may include coordinate information of the two-dimensional key points, and the coordinate information of the two-dimensional key points includes abscissa and vertical coordinates. coordinate.
  • the 3D key point information may include the coordinate information of the 3D key points.
  • the coordinate information of the 3D key points represents the coordinates of the 3D key points in the camera coordinate system.
  • the optical axis of the acquisition device is a three-dimensional rectangular coordinate system established by the Z axis, and the X axis and the Y axis of the camera coordinate system are two mutually perpendicular coordinate axes of the image plane.
  • the three-dimensional key point corresponding to the two-dimensional key point may be determined according to the two-dimensional key point information, and the coordinate information of the three-dimensional key point may be determined; Train a keypoint conversion model, which is used to convert 2D keypoints to 3D keypoints; in this way, after obtaining the trained keypoint conversion model, the coordinate information of the 2D keypoints can be input to the training
  • the completed key point conversion model is obtained to obtain the three-dimensional key points corresponding to the two-dimensional key points and the coordinate information of the three-dimensional key points.
  • the network structure of the key point conversion model is not limited.
  • the key point conversion model may be a sequential convolutional network or a non-sequential fully connected network; the network structure of the key point translation model may be based on practical applications. Requirements are predetermined.
  • detection and tracking of human body key points may be performed on the at least one frame of image to obtain the two-dimensional key point information of at least one human body in the current frame image and Three-dimensional key point information; understandably, tracking human body key points based on multiple frames of images is conducive to accurately obtaining the two-dimensional key point information of at least one human body in the current frame image, which is conducive to obtaining accurate three-dimensional key point information .
  • the detection and tracking of human body key points may be performed on the continuous frame images to obtain the two-dimensional key point information and three-dimensional key point information of at least one human body in the current frame image.
  • Key point information understandably, the tracking of human body key points based on consecutive frame images is conducive to further accurately obtaining the two-dimensional key point information of at least one human body in the current frame image, which is further conducive to obtaining accurate three-dimensional key point information .
  • Step 203 Determine the depth detection result of the human body in the current frame image according to the two-dimensional key point information and the three-dimensional key point information of the human body in the current frame image and the mask image of the human body.
  • the above-mentioned human body includes at least two human bodies; correspondingly, the implementation of step 203 may be: according to the two-dimensional key point information and three-dimensional key point information of the at least two human bodies in the current frame image, and the at least two human bodies The mask image of the body is determined, and the depth detection results of the at least two human bodies in the current frame image are determined.
  • the above-mentioned human body includes a single target human body; correspondingly, the implementation of step 203 may be: according to the two-dimensional key point information and three-dimensional key point information of at least one human body in the current frame image, and the human body of the target human body Mask image to determine the depth detection result of the target human body in the current frame image.
  • the above steps 201 to 203 may be implemented based on a processor of an electronic device, and the above processor may be an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), Digital Signal Processing Device (DSPD), Programmable Logic Device (PLD), Field Programmable Gate Array (FPGA), Central Processing Unit (CPU), At least one of a controller, a microcontroller, and a microprocessor.
  • ASIC Application Specific Integrated Circuit
  • DSP Digital Signal Processor
  • DSPD Digital Signal Processing Device
  • PLD Programmable Logic Device
  • FPGA Field Programmable Gate Array
  • CPU Central Processing Unit
  • the electronic device that implements the function of the above processor may also be other, which is not limited by the embodiment of the present disclosure.
  • the embodiments of the present disclosure can combine the human body mask image and the two-dimensional key points and three-dimensional key information of the human body to determine the depth detection results of multiple human bodies, and there is no need to obtain the human body in the image through special hardware devices such as a three-dimensional depth camera. Therefore, the embodiments of the present disclosure can realize the depth detection of the human body in the image without relying on special hardware devices such as a 3D depth camera, and can be applied to scenarios such as AR interaction and virtual photography.
  • the embodiments of the present disclosure can obtain the depth information between each pixel point of the human body and the camera, instead of estimating a single depth for each pixel point of the human body, the obtained depth information is relatively rich and can be applied to multiple In scenes, for example, the application scope of the embodiments of the present disclosure includes but is not limited to: 3D reconstruction and presentation of dynamic human bodies in 3D human body reconstruction; occlusion display of human bodies and virtual scenes in augmented reality applications; interaction, etc.; further, the embodiment of the present disclosure does not directly estimate the depth information of the human body pixels based on the image matching algorithm of continuous frames, but uses the two-dimensional key point information and three-dimensional key point information of the human body to determine the depth information of the human body pixels. Compared with the scheme of depth information estimation based on image matching algorithm based on continuous frames, the consumption of time resources and computing resources is reduced, and a balance is achieved between the estimation accuracy of depth information and the time-consuming of determining depth information.
  • At least one frame of image collected by the above image collection device is an RGB image; it can be seen that the embodiments of the present disclosure can implement depth detection of multiple human bodies based on easily obtained RGB images, which is easy to implement.
  • the two-dimensional key point information of each of the at least two human bodies can be compared with each of the mask images of the at least two human bodies.
  • the mask images of the human body are matched to obtain the two-dimensional key point information belonging to each human body; Depth detection results of the human body.
  • the two-dimensional key point information of at least two human bodies in the current frame image with the mask image of each human body, the two-dimensional key point information of each human body can be directly obtained, and then the two-dimensional key point information of each human body can be directly obtained. Determine the depth detection results for each human body.
  • the above-mentioned two-dimensional key point information is a two-dimensional key point representing a human skeleton
  • the three-dimensional key point information is a three-dimensional key point representing a human skeleton
  • the two-dimensional key points of the human skeleton are used to represent the key positions of the human body in the image plane.
  • the key positions of the human body include but are not limited to facial features, neck, shoulders, elbows, hands, hips, knees, feet, etc.; the key positions of the human body can be determined according to The actual situation is preset; exemplarily, referring to Fig. 3, the two-dimensional key points of the human skeleton can represent 14 key positions of the human body or 17 key positions of the human body.
  • the solid dots collectively represent the 17 key positions of the human body.
  • the embodiment of the present disclosure can obtain the two-dimensional key points of each human skeleton, and determine the depth detection result of each human body based on the two-dimensional key points of each human skeleton.
  • the correlation between the two-dimensional key points of the skeletons of different human bodies is small. Therefore, the embodiments of the present disclosure can realize depth detection of at least two human bodies in an image.
  • the two-dimensional key point information of a human body whose position overlap with the mask image of each human body reaches a set value may be used as each human body.
  • 2D keypoint information of the volume may be used as each human body.
  • the set value may be a value preset according to an actual application scenario, for example, the set value may be any value between 80% and 90%;
  • the coordinate information of the two-dimensional key points of the human body and the position information of the mask image of the human body determine the degree of overlap between the two-dimensional key point information of each human body and the mask image of each human body.
  • the two-dimensional key point information of the at least two human bodies can be displayed in two of the above-mentioned at least two human bodies.
  • the 2D key point information of a human body with the highest overlap with the position of the mask image is selected.
  • the two-dimensional key point information of each human body can be directly determined according to the positional overlap between the two-dimensional key point information and the mask image of each human body, which is beneficial to accurately obtain each human body.
  • 2D keypoint information of the volume can be directly determined according to the positional overlap between the two-dimensional key point information and the mask image of each human body, which is beneficial to accurately obtain each human body.
  • optimization processing may be performed on the two-dimensional key point information of at least two human bodies in the above-mentioned current frame image, so as to obtain the two-dimensional key point information of the at least two human bodies after the optimization processing;
  • the two-dimensional key point information of at least two human bodies the two-dimensional key point information of one human body whose position overlap with the mask image of each human body reaches a set value is used as the two-dimensional key point information of each human body.
  • At least one frame of image can be further In the case of including historical frame images, the two-dimensional key point information of at least two human bodies in the current frame image and the two-dimensional key point information of at least two human bodies in the historical frame images are processed to obtain at least two human bodies after optimization processing. 2D keypoint information of .
  • time series filtering processing may be performed on the two-dimensional key point information of at least two human bodies in the current frame image and the two-dimensional key point information of at least two human bodies in the historical frame image, to obtain at least two human bodies after filtering processing.
  • two-dimensional key points of the body methods of time series filtering processing include but are not limited to time series low-pass filtering, time series extended Kalman filtering; in other embodiments, the two-dimensional key point information of at least two human bodies in the current frame image can be The skeleton limb length optimization process is performed with the two-dimensional key point information of at least two human bodies in the historical frame images, and the two-dimensional key point information of the at least two human bodies after filtering processing is obtained.
  • optimizing the two-dimensional key point information of at least two human bodies in the current frame image in combination with the two-dimensional key point information of the at least two human bodies in the historical frame image is beneficial to improve the timing stability of the two-dimensional key point information. , which is beneficial to improve the timing stability of human depth detection.
  • the two-dimensional key point of each human body can be determined.
  • the coordinate information of the three-dimensional key point information of each human body can be used as the depth information of the two-dimensional key points of each human body.
  • the depth information represents the depth information of the pixels that overlap with the two-dimensional keypoint positions.
  • any pixel in the mask image of each human body is not a pixel that overlaps with the position of the two-dimensional key point, it can be considered that any of the above-mentioned pixels is the first pixel.
  • the depth information of the dimensional key points is interpolated to obtain the depth information of the first pixel in the mask image of each human body.
  • Interpolation is an important method for discrete function approximation. Using interpolation, the approximate value of the function at other points can be estimated by the value of the function at a limited number of points.
  • the depth information of the complete pixels in the mask image of each human body may be obtained based on an interpolation processing method under a preset spatial continuity constraint.
  • a smoothing filtering process may also be performed on the depth information of each pixel in the mask image of each human body.
  • a depth map of each human body can also be generated based on the depth information of each pixel, and the depth map can be displayed on the electronic device in the display interface.
  • the embodiments of the present disclosure can determine the depth information for any pixel point of the mask image of each human body, and can comprehensively realize the depth detection of each human body in the image.
  • the depth information of the two-dimensional key points of each human body determines the discrete function used to characterize the relationship between the pixel position and the pixel depth information; according to the depth information of the two-dimensional key points of each human body, supplement the discrete function at the position of the first pixel point. Taking a value, the value of the discrete function at the position of the first pixel is determined as the depth information of the first pixel.
  • the above-mentioned contents merely describe the principle of the interpolation processing, and do not limit the specific implementation of the interpolation processing.
  • the specific implementation of the interpolation processing includes, but is not limited to, nearest neighbor interpolation. Completion, interpolation completion based on breadth-first search, etc.
  • the two-dimensional key point information of at least one human body in the current frame image can be matched with the human body mask image of the target human body to obtain the image in the current frame image.
  • the two-dimensional key point information of the target human body then, according to the three-dimensional key point information corresponding to the two-dimensional key point information of the target human body in the current frame image, the depth detection result of the target human body in the current frame image is determined.
  • the two-dimensional key point information of at least one human body in the current frame image with the human body mask image of the target human body, the two-dimensional key point information of the target human body can be directly obtained, and then the target can be determined.
  • the result of the depth detection of the human body that is, the depth detection of the target human body in the image can be realized without relying on special hardware devices such as a 3D depth camera.
  • the two-dimensional key point information of the target human body may be determined from the two-dimensional key point information of the at least one human body; the two-dimensional key point information of the target human body is: the position of the human body mask image of the target human body. Two-dimensional key point information of a human body whose degree of overlap reaches a set value.
  • the two-dimensional key point information of each human body and the target human body's two-dimensional key point information can be determined according to the coordinate information of the two-dimensional key points of each human body in the at least one human body and the position information of the human body mask image of the target human body.
  • the degree of overlap of body mask images can be determined according to the coordinate information of the two-dimensional key points of each human body in the at least one human body and the position information of the human body mask image of the target human body.
  • the two-dimensional key point information of the multiple human bodies may be selected from the two-dimensional key point information of the target human body.
  • the two-dimensional key point information of a human body with the highest position overlap of the human body mask image of the target human body is used as the two-dimensional key point information of the target human body.
  • the two-dimensional key point information of the target human body can be directly determined according to the positional overlap between the two-dimensional key point information and the human body mask image of the target human body, which is beneficial to accurately obtain the target human body's information.
  • 2D keypoint information can be directly determined according to the positional overlap between the two-dimensional key point information and the human body mask image of the target human body, which is beneficial to accurately obtain the target human body's information.
  • the target human body in the current frame image can be determined.
  • the coordinate information of the three-dimensional key points of the target human body may be used as the depth information of the two-dimensional key points of the target human body.
  • any pixel in the body mask image of the target human body or the set of pixel points is not a pixel that overlaps with the position of the two-dimensional key point, it can be considered that any of the above-mentioned pixels is the first pixel.
  • the pixel points adjacent to the first pixel point are used as the depth information of the first pixel point; that is, for the first pixel point, the pixels adjacent to the first pixel point can be selected from the pixel points overlapping with the position of the two-dimensional key point. point, the depth information of the first pixel point is determined based on the Z-axis coordinate value of the three-dimensional key point corresponding to the selected pixel point.
  • the embodiments of the present disclosure can determine the depth information for the human body mask image of the target human body or any pixel point in the pixel point set, and can comprehensively realize the depth detection of the target human body in the image.
  • the connected area of the two-dimensional key points can be searched based on the two-dimensional key points of the target human body in the current frame image, and the human body of the target human body can be masked. Pixels not included in the connected region in the membrane image are deleted to obtain a set of pixels.
  • the two-dimensional key points of the target human body in the current frame image are used as seed points, and a breadth-first search is performed to determine the two-dimensional key points of the target human body in the current frame image. A connected region of the two-dimensional keypoints is searched.
  • the pixels that are not included in the connected area in the human body mask image of the target human body are pixels that cannot be searched on the basis of two-dimensional key points, and the two-dimensional key points represent key positions in the human body. Therefore, the human body of the target human body Pixels not included in the connected area in the mask image can be considered as wrong pixels; by deleting the pixels not included in the connected area in the human body mask image of the target human body, it is beneficial to improve the accuracy of the depth detection of the target human body.
  • an implementation manner of determining the depth detection result of the human body in the current frame image can be:
  • the two-dimensional key point information of at least one human body in the current frame image after the optimization process can be obtained first, and then, according to the two-dimensional key point information after the optimization process, the corresponding two-dimensional key point can be further determined.
  • any one of the frame images in response to the presence of a two-dimensional key point of the target human body in any one frame of images in the at least one frame of image, and the three-dimensional key point corresponding to the two-dimensional key point of the target human body in the above-mentioned any one frame of image is in a predetermined area situation, determine that any one of the frame images is a valid image.
  • the two-dimensional key point information of at least one human body in the current frame image is Matching with the human body mask image of the target human body may not obtain the two-dimensional key point information of the target human body in the current frame image, that is, there is a situation where there is no two-dimensional key point of the target human body in any frame of at least one frame of image.
  • the three-dimensional key points are obtained from the two-dimensional key points, in the case that there is no two-dimensional key point of the target human body in any frame of images, it can be determined that there is no target human body in any of the above-mentioned frames of images. 3D keypoints.
  • the two-dimensional key points of the target human body in the above-mentioned any frame of images may be determined according to the coordinate information in the three-dimensional key point information.
  • the corresponding 3D key points are in the preset area.
  • images other than valid images may be regarded as invalid images, and the processing of invalid images may also be omitted, so that the accuracy of human depth detection can be improved.
  • the preset area may be preset according to the actual application scenario; in some embodiments, the current frame image may be determined according to the coordinate information of the three-dimensional key points corresponding to the two-dimensional key points of the target human body in the current frame image. The distance between the 3D key point corresponding to the 2D key point of the target human body and the image acquisition device. If the distance between the 3D key point and the image acquisition device is greater than the set distance, determine the corresponding 2D key point of the target human body in the current frame image.
  • the 3D key point is not in the preset area; when the distance between the 3D key point and the image acquisition device is less than or equal to the set distance, it can be determined that the 3D key point corresponding to the 2D key point of the target body in the current frame image is in the preset area. set area.
  • the coordinate value of the Z axis in the coordinate information of the three-dimensional key points represents the distance between the three-dimensional key point and the image acquisition device. Therefore, the distance between the three-dimensional key point and the image acquisition device can be determined according to the coordinate information of the three-dimensional key point. Is it greater than the set distance.
  • the set distance may be data preset according to actual application requirements.
  • the three-dimensional key point is a key point that meets the requirements. It is beneficial to obtain the depth detection result of the target human body accurately in the follow-up.
  • the coordinate information of the two-dimensional key points of the target human body in the current frame image after optimization processing can be obtained according to the coordinate information of the two-dimensional key points of the target human body in the valid historical frame images of at least one frame of images.
  • the two-dimensional key point of the target human body in response to the fact that the two-dimensional key point of the target human body is not detected from the current frame image, or the three-dimensional key point corresponding to the two-dimensional key point of the target human body in the current frame image is not in the preset area, From the valid historical frame images of at least one frame of image, one frame of image can be selected, and the coordinate information of the two-dimensional key point of the target human body in the selected one frame of image is used as the target human body in the current frame image after optimization processing.
  • the coordinate information of the 2D keypoints From the valid historical frame images of at least one frame of image, one frame of image can be selected, and the coordinate information of the two-dimensional key point of the target human body in the selected one frame of image is used as the target human body in the current frame image after optimization processing.
  • the coordinate information of the 2D keypoints in response to the fact that the two-dimensional key point of the target human body is not detected from the current frame image, or the three-dimensional key point corresponding to the
  • the two-dimensional key points of the target human body in the current frame image after the optimization process can be obtained according to the two-dimensional key points of the target human body in the valid historical frame images, which is beneficial to improve the subsequent human body depth. Stability of test results.
  • an implementation manner of selecting a frame of images from the valid historical frame images of the at least one frame of images may be, in the valid historical frame images of the at least one frame of images, selecting the minimum time interval from the current frame image For example, at least one frame of image is recorded as the first frame image to the fifth frame image in chronological order, wherein the fifth frame image is the current frame image, and the first frame image to the third frame image are valid
  • the 4th frame image is an invalid historical frame image. In this way, when the 5th frame image does not have the two-dimensional key point of the target human body, you can select from the 1st frame image to the 3rd frame image.
  • the third frame image with the smallest time interval from the current frame image.
  • the two-dimensional key point information of at least one human body in the current frame image after optimization processing is obtained, which is beneficial to accurately obtain the target of the current frame image.
  • 2D keypoint information of the human body is obtained, which is beneficial to accurately obtain the target of the current frame image.
  • the target human body in the current frame image and the valid historical frame images of at least one frame image can be The coordinate information of the two-dimensional key points of the target human body in the optimized current frame image is obtained.
  • the coordinate information of the two-dimensional key points of the target human body in the current frame image and the valid historical frame images in at least one frame image may be averagely calculated to obtain the optimized two-dimensional image of the target human body in the current frame image. Coordinate information of dimension key points.
  • At least one frame of images is recorded as the 6th frame to the 8th frame in chronological order, wherein the 8th frame is the current frame, and the 6th to 8th frames are all valid historical frames.
  • the average calculation can be performed on the coordinate information of the two-dimensional key points of the target human body in the sixth frame image to the eighth frame image, and the result of the average calculation can be used as the updated two-dimensional key point of the target human body in the eighth frame image. Coordinate information.
  • updating the coordinate information of the two-dimensional key points of the target human body in the current frame image according to the coordinate information of the two-dimensional key points of the target human body in the current frame image and the valid historical frame images of at least one frame image is beneficial to The coordinate information of the two-dimensional key points of the current frame image is smoothed.
  • FIG. 4A is a schematic diagram of two-dimensional key points of a target human body provided by an embodiment of the present disclosure. As shown in FIG. 4A , circles in the human body represent two-dimensional key points of the target human body in the current frame image.
  • the three-dimensional key point information corresponding to the two-dimensional key point information of the target human body in the current frame image may be determined; in some embodiments, the target body in the current frame image may be displayed at the same time.
  • FIG. 4B is a schematic diagram of the three-dimensional key points and the human body mask image of the target human body provided by the embodiment of the present disclosure, as shown in FIG.
  • the location of the point represents the location of the image acquisition device
  • the location of point O displays the three coordinate axes of the camera coordinate system
  • the human body mask image of the target human body is the outline of the human body shown in Figure 4B
  • the two-dimensional key points of the target human body correspond to
  • the 3D keypoints are the pattern of filled dots behind the body mask image of the target body.
  • the three-dimensional key point information corresponding to the two-dimensional key point information of the target human body in the current frame image can be determined. , and determine the depth detection result of the target human body in the current frame image.
  • FIG. 5 is a schematic structural diagram of the implementation of a depth detection method according to an embodiment of the present disclosure.
  • the image acquisition device 501 can send the acquired multi-frame images to the processor 5021 of the electronic device 502.
  • the multi-frame images The image includes the current frame image and the historical frame image, and the multi-frame images are all RGB images; the processor 5021 can perform human body image segmentation on the current frame image of the multi-frame images to obtain at least two human body mask images;
  • the image is subjected to detection and tracking of human body key points, and two-dimensional key point information and three-dimensional key point information of at least two human bodies in the current frame image are obtained.
  • post-processing optimization can also be performed. And the above-mentioned process of performing interpolation processing on the depth information of the two-dimensional key points.
  • Another depth detection method provided by an embodiment of the present disclosure can also be implemented by the schematic structural diagram shown in FIG. 5021; the processor 5021 can perform image segmentation of a single human body on the current frame image of the multi-frame images to obtain a human body mask image of the target human body; it can also perform detection and tracking of key points of the human body based on the multi-frame images to obtain the current frame image.
  • Two-dimensional key point information and three-dimensional key point information of at least one human body After obtaining the two-dimensional key point information and three-dimensional key point information of at least one human body in the current frame image, post-processing optimization can also be performed, and the post-processing optimization includes the above-mentioned optimization of the two-dimensional key point information and the three-dimensional key point information. process.
  • the depth detection result of the target human body in the current frame image is determined according to the two-dimensional key point information and three-dimensional key point information of at least one human body in the current frame image, and the mask image of the target human body, based on the current frame image.
  • the depth detection result of the target human body in the image generates a depth map of the target human body, and the depth map can be displayed on the display interface 5022 of the electronic device 502 to realize human-computer interaction.
  • the display interface 5022 can also display a point cloud corresponding to each pixel in the depth map;
  • FIG. 6 is a schematic diagram of a point cloud provided by an embodiment of the present disclosure.
  • the dots of represent the point cloud composed of pixel points
  • the bold solid dots represent the key points of the skeleton
  • the lines between the bold solid dots represent the skeleton of the human body.
  • the AR effect display may also be performed based on the depth detection result of the human body.
  • the positional relationship between the human body and at least one target object in the AR scene may be determined according to the depth detection result of the human body in the current frame image; based on the positional relationship, the combined presentation mode of the human body and the at least one target object is determined; based on the combination The presentation method shows the AR effect superimposed on the human body and at least one target object.
  • the target object may be an object that actually exists in a real scene, and the depth information of the target object may be known, or may be information determined according to the shooting data of the target object; the target object may also be a preset virtual object, the virtual The depth information of the object is predetermined.
  • the positional relationship between the at least two human bodies and at least one target object in the AR scene, and the position between the at least two human bodies may be determined according to the depth detection results of the at least two human bodies and the depth information of the target object.
  • the positional relationship between each human body and the target object in the AR scene may include the following situations: 1) the human body is closer to the image acquisition device than the target object, 2) the target object is closer to the image than the human body The acquisition device, 3) the human body is located on the right, left, upper or lower side of the target object, 4) a part of the human body is closer to the image acquisition device than the target object, and the other part is farther away from the image acquisition device than the target object;
  • the positional relationship between at least two human bodies may include the following situations: 1) one human body is closer to the image acquisition device than the other human body, 2) one human body is located on the side, left, upper or lower side of the other human body 3) A part of a human body is closer to the image acquisition device than another human body, and
  • a combined presentation mode of the at least two human bodies and the at least one target object may be determined, so that the combined presentation mode reflects the above positional relationship. In this way, based on the combined presentation mode , to display the AR effect of multiple human bodies and at least one target object superimposed, which is beneficial to improve the AR display effect.
  • the positional relationship between the target human body and at least one target object in the AR scene can be determined according to the depth detection result of the target human body and the depth information of the target object;
  • the positional relationship can include the following situations: 1) the target human body is closer to the image acquisition device than the target object, 2) the target object is closer to the image acquisition device than the target human body, 3) the single human body is located on the right side of the target object , left side, upper side or lower side, 4) A part of a single human body is closer to the image acquisition device than the target object, and the other part is farther away from the image acquisition device than the target object; it should be noted that the above is only for the target object.
  • the positional relationship between the human body and the target object in the AR scene is exemplified, and the embodiments of the present disclosure are not limited thereto.
  • a combined presentation mode of the target human body and the at least one target object can be determined, so that the combined presentation mode reflects the above positional relationship.
  • the target human body and the at least one The AR effect superimposed on the target object is beneficial to enhance the AR display effect.
  • an embodiment of the present disclosure further provides a depth detection apparatus 7, and the depth detection apparatus 7 may be located in the electronic device 502 described above.
  • FIG. 7 is a schematic structural diagram of a depth detection apparatus 7 according to an embodiment of the present disclosure. As shown in FIG. 7 , the depth detection apparatus 7 may include:
  • the acquisition module 701 is configured to: acquire at least one frame of image collected by the image acquisition device, where the at least one frame of image includes the current frame image;
  • the processing module 702 is configured to: perform human body image segmentation on the current frame image to obtain a mask image of the human body; perform human body key point detection on the at least one frame image to obtain two human body images in the current frame image. 3D key point information and 3D key point information;
  • the detection module 703 is configured to: determine the depth detection result of the human body in the current frame image according to the two-dimensional key point information and three-dimensional key point information of the human body in the current frame image, and the mask image of the human body, wherein,
  • the human body includes a single human body or at least two human bodies.
  • the human body includes at least two human bodies; the detection module 703 is specifically configured as:
  • the depth detection result of each human body in the current frame image is determined according to the three-dimensional key point information corresponding to the two-dimensional key point information belonging to each human body respectively.
  • the detection module 703 is specifically configured to, in the two-dimensional key point information of the at least two human bodies, overlap the position of the mask image with each human body to a predetermined value.
  • the two-dimensional key point information of a human body is used as the two-dimensional key point information of each human body.
  • the detection module 703 is specifically configured as:
  • the first pixel point represents the each person Any pixel in the mask image of the volume except for the pixel that overlaps with the position of the two-dimensional key point.
  • the detection module 703 is specifically configured as:
  • the two-dimensional key point information of one human body whose position overlap with the mask image of each human body reaches a set value is used as the information of each human body. 2D keypoint information.
  • the detection module 703 is specifically configured as:
  • the two-dimensional key point information of at least two human bodies in the current frame image and the two-dimensional key point information of at least two human bodies in the historical frame image Perform processing to obtain two-dimensional key point information of at least two human bodies after optimization processing.
  • the human body includes a single target human body
  • the detection module 703 is specifically configured as:
  • the two-dimensional key point information includes coordinate information of two-dimensional key points
  • the detection module 703 is specifically configured as:
  • the target human body in the image In response to the fact that the three-dimensional key point corresponding to the two-dimensional key point of the target human body in the current frame image is in the preset area, according to the current frame image and the valid historical frame image of the at least one frame image, the target human body in the image.
  • the coordinate information of the two-dimensional key points is obtained, and the coordinate information of the two-dimensional key points of the target human body in the current frame image after optimization processing is obtained.
  • the detection module 703 is specifically configured to average the coordinate information of the two-dimensional key points of the target human body in the current frame image and the valid historical frame images in the at least one frame image. The calculation is performed to obtain the coordinate information of the two-dimensional key points of the target human body in the current frame image after optimization processing.
  • the detection module 703 is further configured to: in response to detecting a two-dimensional key point of the target human body from any frame of the at least one frame of images, and to detect the any frame In the case where the three-dimensional key point corresponding to the two-dimensional key point of the target human body in the image is in the preset area, it is determined that any one frame of image is a valid image.
  • the detection module 703 is specifically configured as:
  • the distance between the three-dimensional key points corresponding to the two-dimensional key points of the target human body in the current frame image and the image acquisition device is determined ;
  • the distance is less than or equal to the set distance, it is determined that the three-dimensional key point corresponding to the two-dimensional key point of the target human body in the current frame image is in the preset area.
  • the detection module 703 is specifically configured as:
  • the two-dimensional key point information of the target human body in the current frame image is obtained;
  • the depth detection result of the target human body in the current frame image is determined according to the three-dimensional key point information corresponding to the two-dimensional key point information of the target human body in the current frame image.
  • the detection module 703 is specifically configured as:
  • the two-dimensional key point information of the target human body determines the two-dimensional key point information of the target human body; the two-dimensional key point information of the target human body is: the position of the human body mask image of the target human body Two-dimensional key point information of a human body whose degree of overlap reaches a set value.
  • the detection module 703 is specifically configured as:
  • the pixel point set includes: the pixel points of the human body mask image of the target human body after filtering processing according to a preset filtering method.
  • the detection module 703 is further configured to:
  • the connected area of the two-dimensional key points is searched based on the two-dimensional key points of the target human body in the current frame image, and the two-dimensional key points in the human body mask image of the target human body are searched. Pixels not included in the connected region are deleted to obtain the set of pixels.
  • processing module 702 is further configured to:
  • an AR effect superimposed on the human body and the at least one target object is displayed.
  • the two-dimensional key point information is a two-dimensional key point representing a human skeleton
  • the three-dimensional key point information is a three-dimensional key point representing a human skeleton
  • At least one frame of image collected by the image collection device is an RGB image.
  • the acquisition module 701, the processing module 702 and the detection module 703 can all be implemented by a processor in an electronic device, and the above-mentioned processor can be an ASIC, DSP, DSPD, PLD, FPGA, CPU, controller, microcontroller, at least one of the microprocessors.
  • the above-mentioned display method is implemented in the form of a software function module and sold or used as an independent product, it may also be stored in a computer-readable storage medium.
  • the technical solutions of the embodiments of the present disclosure essentially or the parts that make contributions to the prior art can be embodied in the form of a software product, and the computer software product is stored in a storage medium and includes several instructions for A computer device (which may be a terminal, a server, etc.) is caused to execute all or part of the methods of various embodiments of the present disclosure.
  • the aforementioned storage medium includes: a U disk, a mobile hard disk, a read only memory (Read Only Memory, ROM), a magnetic disk or an optical disk and other media that can store program codes.
  • ROM Read Only Memory
  • an embodiment of the present disclosure further provides a computer program product, where the computer program product includes computer-executable instructions, and the computer-executable instructions are used to implement the depth detection method provided by the embodiment of the present disclosure.
  • an embodiment of the present disclosure further provides a computer storage medium, where computer-executable instructions are stored on the computer storage medium, and the computer-executable instructions are used to implement the depth detection method provided by the foregoing embodiments.
  • FIG. 8 is a schematic structural diagram of the electronic device 10 provided by an embodiment of the present disclosure. As shown in FIG. 8 , the electronic device 502 includes:
  • the processor 5021 is configured to implement any one of the above depth detection methods when executing the executable instructions stored in the memory.
  • the memory 801 is configured to store computer programs and applications by the processor 5021, and can also cache data to be processed or processed by the processor 5021 and various modules in the electronic device (for example, image data, audio data, voice communication data and video communication data). ), which can be implemented by flash memory (FLASH) or random access memory (Random Access Memory, RAM).
  • the above-mentioned processor 5021 may be at least one of ASIC, DSP, DSPD, PLD, FPGA, CPU, controller, microcontroller, and microprocessor. It can be understood that the electronic device that implements the function of the above processor may also be other, which is not limited by the embodiment of the present disclosure.
  • the above-mentioned computer-readable storage medium/memory can be ROM, programmable read-only memory (Programmable Read-Only Memory, PROM), erasable programmable read-only memory (Erasable Programmable Read-Only Memory, EPROM), electrically erasable Programmable Read-Only Memory (Electrically Erasable Programmable Read-Only Memory, EEPROM), Magnetic Random Access Memory (Ferromagnetic Random Access Memory, FRAM), Flash Memory (Flash Memory), Magnetic Surface Memory, Optical Disk, or Optical Disk ( Compact Disc Read-Only Memory, CD-ROM) and other memories; it can also be various terminals including one or any combination of the above memories, such as mobile phones, computers, tablet devices, personal digital assistants, etc.
  • the disclosed apparatus and method may be implemented in other manners.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components may be combined, or Can be integrated into another system, or some features can be ignored, or not implemented.
  • the coupling, or direct coupling, or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be electrical, mechanical or other forms. of.
  • the unit described above as a separate component may or may not be physically separated, and the component displayed as a unit may or may not be a physical unit; it may be located in one place or distributed to multiple network units; Some or all of the units may be selected according to actual needs to achieve the purpose of the solutions of the embodiments of the present disclosure.
  • each functional unit in each embodiment of the present disclosure may be all integrated into one processing unit, or each unit may be separately used as a unit, or two or more units may be integrated into one unit; the above integration
  • the unit can be implemented either in the form of hardware or in the form of hardware plus software functional units.
  • the above-mentioned integrated units of the present disclosure are implemented in the form of software functional modules and sold or used as independent products, they may also be stored in a computer-readable storage medium.
  • the technical solutions of the embodiments of the present disclosure may be embodied in the form of software products that are essentially or contribute to related technologies.
  • the computer software products are stored in a storage medium and include several instructions to make The device automated test line performs all or part of the methods described in various embodiments of the present disclosure.
  • the aforementioned storage medium includes various media that can store program codes, such as a removable storage device, a ROM, a magnetic disk, or an optical disk.
  • Embodiments of the present disclosure disclose a depth detection method, device, electronic device, storage medium, and program.
  • the method includes: acquiring at least one frame of image collected by an image acquisition device, where the at least one frame of image includes a current frame image;
  • the image is divided into a human body image to obtain a mask image of the human body;
  • the human body key point detection is performed on at least one frame of the image to obtain the two-dimensional key point information and three-dimensional key point information of the human body in the current frame image; according to the current frame image
  • the two-dimensional key point information and three-dimensional key point information of the human body and the mask image of the human body are used to determine the depth detection result of the human body in the current frame image, wherein the human body includes a single human body or at least two human bodies.
  • the depth detection method provided by the embodiments of the present disclosure can realize the depth detection of a human body in an image without relying on special hardware devices such as a 3D depth camera, and can be applied to scenarios such as AR interaction

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

La présente divulgation concerne un procédé et un appareil de détection de profondeur, un dispositif électronique, un support de stockage lisible par ordinateur et un programme. Le procédé comprend : l'acquisition d'au moins une trame d'image collectée par un dispositif de collecte d'images, la ou les trames d'image comprenant une image de trame actuelle ; la réalisation d'une segmentation d'image de corps humain sur l'image de trame actuelle pour obtenir une image de masque de corps humain ; la détection de points clés de corps humain dans la ou les trames d'image pour obtenir des informations bidimensionnelles de point clé et des informations tridimensionnelles de point clé du corps humain dans l'image de trame actuelle ; et la détermination d'un résultat de détection de profondeur du corps humain dans l'image de trame actuelle selon les informations bidimensionnelles de point clé et les informations tridimensionnelles de point clé du corps humain dans l'image de trame actuelle ainsi que dans l'image de masque de corps humain, le corps humain comprenant un seul corps humain ou au moins deux corps humains.
PCT/CN2021/109803 2020-11-24 2021-07-30 Procédé et appareil de détection de profondeur, dispositif électronique, support de stockage et programme WO2022110877A1 (fr)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN202011335257.4A CN112465890B (zh) 2020-11-24 深度检测方法、装置、电子设备和计算机可读存储介质
CN202011344694.2A CN112419388B (zh) 2020-11-24 深度检测方法、装置、电子设备和计算机可读存储介质
CN202011335257.4 2020-11-24
CN202011344694.2 2020-11-24

Publications (1)

Publication Number Publication Date
WO2022110877A1 true WO2022110877A1 (fr) 2022-06-02

Family

ID=81755266

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/109803 WO2022110877A1 (fr) 2020-11-24 2021-07-30 Procédé et appareil de détection de profondeur, dispositif électronique, support de stockage et programme

Country Status (2)

Country Link
TW (1) TW202221646A (fr)
WO (1) WO2022110877A1 (fr)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115375856A (zh) * 2022-10-25 2022-11-22 杭州华橙软件技术有限公司 三维重建方法、设备以及存储介质
CN117237397A (zh) * 2023-07-13 2023-12-15 天翼爱音乐文化科技有限公司 基于特征融合的人像分割方法、系统、设备及存储介质
CN118592943A (zh) * 2024-08-07 2024-09-06 宁波星巡智能科技有限公司 基于关键点序列分析的人体跌倒检测方法、装置及设备

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108460338A (zh) * 2018-02-02 2018-08-28 北京市商汤科技开发有限公司 人体姿态估计方法和装置、电子设备、存储介质、程序
CN108876835A (zh) * 2018-03-28 2018-11-23 北京旷视科技有限公司 深度信息检测方法、装置和系统及存储介质
US20190012807A1 (en) * 2017-07-04 2019-01-10 Baidu Online Network Technology (Beijing) Co., Ltd.. Three-dimensional posture estimating method and apparatus, device and computer storage medium
CN110047100A (zh) * 2019-04-01 2019-07-23 四川深瑞视科技有限公司 深度信息检测方法、装置及系统
CN110458177A (zh) * 2019-07-12 2019-11-15 中国科学院深圳先进技术研究院 图像深度信息的获取方法、图像处理装置以及存储介质
CN110826357A (zh) * 2018-08-07 2020-02-21 北京市商汤科技开发有限公司 对象三维检测及智能驾驶控制的方法、装置、介质及设备
CN111210468A (zh) * 2018-11-22 2020-05-29 中移(杭州)信息技术有限公司 一种图像深度信息获取方法及装置
CN112419388A (zh) * 2020-11-24 2021-02-26 深圳市商汤科技有限公司 深度检测方法、装置、电子设备和计算机可读存储介质
CN112465890A (zh) * 2020-11-24 2021-03-09 深圳市商汤科技有限公司 深度检测方法、装置、电子设备和计算机可读存储介质

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190012807A1 (en) * 2017-07-04 2019-01-10 Baidu Online Network Technology (Beijing) Co., Ltd.. Three-dimensional posture estimating method and apparatus, device and computer storage medium
CN108460338A (zh) * 2018-02-02 2018-08-28 北京市商汤科技开发有限公司 人体姿态估计方法和装置、电子设备、存储介质、程序
CN108876835A (zh) * 2018-03-28 2018-11-23 北京旷视科技有限公司 深度信息检测方法、装置和系统及存储介质
CN110826357A (zh) * 2018-08-07 2020-02-21 北京市商汤科技开发有限公司 对象三维检测及智能驾驶控制的方法、装置、介质及设备
CN111210468A (zh) * 2018-11-22 2020-05-29 中移(杭州)信息技术有限公司 一种图像深度信息获取方法及装置
CN110047100A (zh) * 2019-04-01 2019-07-23 四川深瑞视科技有限公司 深度信息检测方法、装置及系统
CN110458177A (zh) * 2019-07-12 2019-11-15 中国科学院深圳先进技术研究院 图像深度信息的获取方法、图像处理装置以及存储介质
CN112419388A (zh) * 2020-11-24 2021-02-26 深圳市商汤科技有限公司 深度检测方法、装置、电子设备和计算机可读存储介质
CN112465890A (zh) * 2020-11-24 2021-03-09 深圳市商汤科技有限公司 深度检测方法、装置、电子设备和计算机可读存储介质

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115375856A (zh) * 2022-10-25 2022-11-22 杭州华橙软件技术有限公司 三维重建方法、设备以及存储介质
CN115375856B (zh) * 2022-10-25 2023-02-07 杭州华橙软件技术有限公司 三维重建方法、设备以及存储介质
CN117237397A (zh) * 2023-07-13 2023-12-15 天翼爱音乐文化科技有限公司 基于特征融合的人像分割方法、系统、设备及存储介质
CN117237397B (zh) * 2023-07-13 2024-05-28 天翼爱音乐文化科技有限公司 基于特征融合的人像分割方法、系统、设备及存储介质
CN118592943A (zh) * 2024-08-07 2024-09-06 宁波星巡智能科技有限公司 基于关键点序列分析的人体跌倒检测方法、装置及设备

Also Published As

Publication number Publication date
TW202221646A (zh) 2022-06-01

Similar Documents

Publication Publication Date Title
WO2022110877A1 (fr) Procédé et appareil de détection de profondeur, dispositif électronique, support de stockage et programme
WO2021083242A1 (fr) Procédé de construction de carte, procédé et système de localisation, terminal de communication sans fil et support lisible par ordinateur
CN108895981B (zh) 一种三维测量方法、装置、服务器和存储介质
CN108062536B (zh) 一种检测方法及装置、计算机存储介质
CN110276317B (zh) 一种物体尺寸检测方法、物体尺寸检测装置及移动终端
WO2023071964A1 (fr) Procédé et appareil de traitement de données, dispositif électronique et support de stockage lisible par ordinateur
US9576183B2 (en) Fast initialization for monocular visual SLAM
US20240153213A1 (en) Data acquisition and reconstruction method and system for human body three-dimensional modeling based on single mobile phone
WO2020134818A1 (fr) Procédé de traitement d'images et produit associé
WO2023024441A1 (fr) Procédé de reconstruction de modèle et appareil associé, et dispositif électronique et support de stockage
WO2023169281A1 (fr) Procédé et appareil d'enregistrement d'image, support de stockage, et dispositif électronique
JP2013164697A (ja) 画像処理装置、画像処理方法、プログラム及び画像処理システム
WO2022088819A1 (fr) Procédé de traitement vidéo, appareil de traitement vidéo et support de stockage
WO2021098554A1 (fr) Procédé et appareil d'extraction de caractéristiques, dispositif, et support de stockage
CN112270709A (zh) 地图构建方法及装置、计算机可读存储介质和电子设备
WO2023168957A1 (fr) Procédé et appareil de détermination de pose, dispositif électronique, support d'enregistrement et programme
WO2024060978A1 (fr) Procédé et appareil d'entraînement de modèle de détection de points clés et procédé et appareil de pilotage de personnage virtuel
CN113362467B (zh) 基于点云预处理和ShuffleNet的移动端三维位姿估计方法
CN117711066A (zh) 一种三维人体姿态估计方法、装置、设备及介质
CN117788686A (zh) 一种基于2d影像的三维场景重建方法、装置及电子设备
WO2024032165A1 (fr) Procédé et système de génération de modèle 3d, et dispositif électronique
WO2023185241A1 (fr) Procédé et appareil de traitement de données, dispositif et support
WO2023015938A1 (fr) Procédé et appareil de détection de point tridimensionnel, dispositif électronique et support de stockage
CN112465890B (zh) 深度检测方法、装置、电子设备和计算机可读存储介质
CN112288817B (zh) 基于图像的三维重建处理方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21896384

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 19.09.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 21896384

Country of ref document: EP

Kind code of ref document: A1