CN113344999A

CN113344999A - Depth detection method and device, electronic equipment and storage medium

Info

Publication number: CN113344999A
Application number: CN202110719313.2A
Authority: CN
Inventors: 赵佳; 谢符宝; 刘文韬; 钱晨
Original assignee: Beijing Sensetime Technology Development Co Ltd
Current assignee: Beijing Sensetime Technology Development Co Ltd
Priority date: 2021-06-28
Filing date: 2021-06-28
Publication date: 2021-09-03
Also published as: WO2023273498A1; TW202301275A

Abstract

The disclosure relates to a depth detection method and apparatus, an electronic device, and a storage medium, the method including: acquiring a frame to be detected, wherein the frame to be detected comprises a target object; performing key point detection on the target object according to the frame to be detected to obtain a key point detection result; determining a characteristic length of a target region in the target object based on the key point detection result, wherein the target region comprises a head region and/or a shoulder region, and the characteristic length is used for representing size information of the target region in the target object; and determining the depth information of the target object in the frame to be detected according to the characteristic length of the target area.

Description

Depth detection method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a depth detection method and apparatus, an electronic device, and a storage medium.

Background

The depth information may reflect a distance of a human body in the image relative to the image acquisition device, and based on the depth information, a human body object in the image may be spatially located.

The monocular camera is a relatively common and widely-used image acquisition device, however, the monocular camera often only can provide two-dimensional information, and how to relatively accurately determine the depth information of a human body in an image based on an image acquired by the monocular camera becomes a problem to be solved urgently at present.

Disclosure of Invention

The present disclosure provides a technical solution for depth detection.

According to an aspect of the present disclosure, there is provided a depth detection method including:

acquiring a frame to be detected, wherein the frame to be detected comprises a target object; performing key point detection on the target object according to the frame to be detected to obtain a key point detection result; determining a characteristic length of a target region in the target object based on the key point detection result, wherein the target region comprises a head region and/or a shoulder region, and the characteristic length is used for representing size information of the target region in the target object; and determining the depth information of the target object in the frame to be detected according to the characteristic length of the target area.

In a possible implementation manner, the performing, according to the frame to be detected, a keypoint detection on the target object to obtain a keypoint detection result includes: and performing key point detection on the target object in the frame to be detected according to the position information of the target object in a reference frame to obtain a key point detection result, wherein the reference frame is a video frame which is positioned before the frame to be detected in the target video to which the frame to be detected belongs.

In a possible implementation manner, the performing, according to the position information of the target object in the reference frame, a keypoint detection on the target object in the frame to be detected to obtain a keypoint detection result includes: according to the first position of the target object in the reference frame, cutting the frame to be detected to obtain a cutting result; and carrying out key point detection on the target object in the cutting result to obtain a key point detection result.

In a possible implementation manner, the performing, according to the position information of the target object in the reference frame, a keypoint detection on the target object in the frame to be detected to obtain a keypoint detection result includes: acquiring a second position of a target area of the target object in the reference frame; according to the second position, the frame to be detected is cut to obtain a cutting result; and carrying out key point detection on the target object in the cutting result to obtain a key point detection result.

In one possible implementation manner, the obtaining a second position of the target area of the target object in the reference frame includes: identifying a target area in the reference frame through a first neural network to obtain a second position output by the first neural network; and/or obtaining a second position of the target area in the reference frame according to a key point detection result corresponding to the reference frame.

In one possible implementation, the key point detection result includes a head key point, a left shoulder key point, and a right shoulder key point; the determining the characteristic length of the target region in the target object based on the key point detection result includes: acquiring a first characteristic length of the target area according to the distance between the left shoulder key point and the right shoulder key point; acquiring a second characteristic length of the target area according to the distance between the head key point and a shoulder central point, wherein the shoulder central point is a middle point between the left shoulder key point and the right shoulder key point; and determining the characteristic length of the target area according to the first characteristic length and/or the second characteristic length.

In one possible implementation, the depth information includes a depth distance, the depth distance includes a distance between the target object and an optical center of an acquisition device, the acquisition device includes a device that performs image acquisition on the target object; the determining the depth information of the target object in the frame to be detected according to the characteristic length of the target area includes: acquiring a preset characteristic length of the target area and a preset device parameter of the acquisition device; and determining the depth distance according to the proportional relation between the preset characteristic length and the characteristic length of the target area and the preset equipment parameter.

In one possible implementation, the depth information includes an offset angle, the offset angle includes a spatial angle of the target object with respect to an optical axis of an acquisition device, and the acquisition device includes a device for image acquisition of the target object; the method further comprises the following steps: acquiring preset equipment parameters of the acquisition equipment; and determining the offset angle according to the preset equipment parameters and the key point detection result.

In one possible implementation, the method further includes: and determining the position of the target object in the three-dimensional space according to the depth information of the target object.

According to an aspect of the present disclosure, there is provided a depth detection apparatus including:

the device comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a frame to be detected, and the frame to be detected comprises a target object; the key point detection module is used for detecting key points of the target object according to the frame to be detected to obtain a key point detection result; a characteristic length determination module, configured to determine a characteristic length of a target region in the target object based on the keypoint detection result, where the target region includes a head region and/or a shoulder region, and the characteristic length is used to characterize size information of the target region in the target object; and the depth detection module is used for determining the depth information of the target object in the frame to be detected according to the characteristic length of the target area.

In one possible implementation manner, the key point detecting module is configured to: and performing key point detection on the target object in the frame to be detected according to the position information of the target object in a reference frame to obtain a key point detection result, wherein the reference frame is a video frame which is positioned before the frame to be detected in the target video to which the frame to be detected belongs.

In one possible implementation, the key point detecting module is further configured to: according to the first position of the target object in the reference frame, cutting the frame to be detected to obtain a cutting result; and carrying out key point detection on the target object in the cutting result to obtain a key point detection result.

In one possible implementation, the key point detecting module is further configured to: acquiring a second position of a target area of the target object in the reference frame; according to the second position, the frame to be detected is cut to obtain a cutting result; and carrying out key point detection on the target object in the cutting result to obtain a key point detection result.

In one possible implementation, the key point detecting module is further configured to: identifying a target area in the reference frame through a first neural network to obtain a second position output by the first neural network; and/or obtaining a second position of the target area in the reference frame according to a key point detection result corresponding to the reference frame.

In one possible implementation, the key point detection result includes a head key point, a left shoulder key point, and a right shoulder key point; the characteristic length determination module is configured to: acquiring a first characteristic length of the target area according to the distance between the left shoulder key point and the right shoulder key point; acquiring a second characteristic length of the target area according to the distance between the head key point and a shoulder central point, wherein the shoulder central point is a middle point between the left shoulder key point and the right shoulder key point; and determining the characteristic length of the target area according to the first characteristic length and/or the second characteristic length.

In one possible implementation, the depth information includes a depth distance, the depth distance includes a distance between the target object and an optical center of an acquisition device, the acquisition device includes a device that performs image acquisition on the target object; the depth detection module is configured to: acquiring a preset characteristic length of the target area and a preset device parameter of the acquisition device; and determining the depth distance according to the proportional relation between the preset characteristic length and the characteristic length of the target area and the preset equipment parameter.

In one possible implementation, the depth information includes an offset angle, the offset angle includes a spatial angle of the target object with respect to an optical axis of an acquisition device, and the acquisition device includes a device for image acquisition of the target object; the apparatus is further configured to: acquiring preset equipment parameters of the acquisition equipment; and determining the offset angle according to the preset equipment parameters and the key point detection result.

In one possible implementation, the apparatus is further configured to: and determining the position of the target object in the three-dimensional space according to the depth information of the target object.

According to an aspect of the present disclosure, there is provided an electronic device including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to invoke the memory-stored instructions to perform the above-described method.

According to an aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the above-described method.

In the embodiment of the disclosure, the key point detection is performed on the target object according to the frame to be detected containing the target object to obtain a key point detection result, the characteristic length of the target region in the target object is determined based on the key point detection result, and the depth information of the target object is determined according to the characteristic length.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure. Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure.

Fig. 1 shows a flow diagram of a depth detection method according to an embodiment of the present disclosure.

Fig. 2 shows a schematic diagram of a target area according to an embodiment of the present disclosure.

Fig. 3 shows a flow diagram of a depth detection method according to an embodiment of the present disclosure.

Fig. 4 shows a block diagram of a depth detection apparatus according to an embodiment of the present disclosure.

Fig. 5 shows a schematic diagram of an application example according to the present disclosure.

Fig. 6 illustrates a block diagram of an electronic device in accordance with an embodiment of the disclosure.

Fig. 7 shows a block diagram of an electronic device according to an embodiment of the disclosure.

Detailed Description

Various exemplary embodiments, features and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. In the drawings, like reference numbers can indicate functionally identical or similar elements. While the various aspects of the embodiments are presented in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.

The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C.

Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements and circuits that are well known to those skilled in the art have not been described in detail so as not to obscure the present disclosure.

Fig. 1 shows a flow diagram of a depth detection method according to an embodiment of the present disclosure. The method may be performed by a depth detection apparatus, where the depth detection apparatus may be an electronic device such as a terminal device or a server, and the terminal device may be a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a Personal Digital Assistant (PDA), a handheld device, a computing device, a vehicle-mounted device, a wearable device, or the like. In some possible implementations, the method may be implemented by a processor calling computer readable instructions stored in a memory. Alternatively, the method may be performed by a server. As shown in fig. 1, the method may include:

step S11, a frame to be detected is acquired, the frame to be detected including the target object.

The frame to be detected may be any image frame having a depth detection requirement, for example, an image frame extracted from a captured video, or an image frame obtained by capturing an image, and the like. The number of frames to be detected is not limited in the embodiment of the present disclosure, and may include one or more frames, and when the frames to be detected include multiple frames, the depth detection method provided in the embodiment of the present disclosure may be used to perform depth detection on multiple frames to be detected simultaneously, or may be used to perform depth detection on multiple frames to be detected sequentially according to a certain sequence.

The frame to be detected includes a target object to be subjected to depth detection, and the type of the target object is not limited in the embodiment of the present disclosure, and may include various human objects, animal objects, or partial mechanical objects, such as a robot. The following disclosure embodiments are described by taking the target object as the human object as an example, and the target object is implemented in other types, which can be flexibly expanded by referring to the following disclosure embodiments and are not described one by one.

The number of target objects included in the frame to be detected is also not limited in the embodiment of the present disclosure, and one or more target objects may be included, and is flexibly determined according to the actual situation.

The method for obtaining the frame to be detected is not limited in the embodiments of the present disclosure, and in a possible implementation manner, frame extraction may be performed from a video to obtain one or more frames to be detected, where the frame extraction may include one or more of frame-by-frame extraction, frame sampling at certain intervals, or random frame sampling. In a possible implementation manner, the target object may also be subjected to image acquisition to obtain a frame to be detected; in some possible implementations, the frames to be detected may also be read from a database.

And step S12, performing key point detection on the target object according to the frame to be detected to obtain a key point detection result.

The key point detection result may include a position of the detected key point in the frame to be detected. The number and types of the detected key points may be flexibly determined according to actual situations, and in some possible implementations, the number of the detected key points may include 2 to 150 key points, and in one example, the detected key points may include 14 limb key points of the human body (e.g., head key points, shoulder key points, neck key points, elbow key points, wrist key points, crotch key points, leg key points, foot key points, and the like), or include 59 contour key points on the contour of the periphery of the human body (e.g., some key points on the periphery of the head or the periphery of the shoulder), and the like. In one possible implementation, in order to reduce the amount of computation, the detected key points may also include only three key points, namely, the head key point, the left shoulder key point, and the right shoulder key point.

The key point detection mode can be flexibly determined according to the actual situation, and in a possible implementation mode, the frame to be detected can be input into any neural network with the key point detection function to realize key point detection; in some possible implementation manners, the key point identification is performed on the frame to be detected through a related key point identification algorithm to obtain a key point detection result; in some possible implementation manners, the keypoint detection may be performed on a partial image region in the frame to be detected according to the position of the target object in the frame to be detected, so as to obtain a keypoint detection result and the like. Some possible specific implementations of step S12 can be seen in the following disclosure, which is not first expanded.

Step S13, based on the key point detection result, determines the feature length of the target region in the target object.

The target area may include a head area and/or a shoulder area, and the head area of the target object may be an area where the head of the target object is located, such as an area formed between a head key point and a neck key point; the shoulder region may be a region where the shoulder and neck of the target object are located, such as a region formed between a neck key point and a shoulder key point.

Fig. 2 shows a schematic diagram of a target area according to an embodiment of the present disclosure, and as shown in the drawing, in a possible implementation manner, in a case that the target area includes a head area and a shoulder area, a head-shoulder frame formed by connecting a head key point, a left-shoulder key point, and a right-shoulder key point may be used as the target area. In one example, the head-shoulder box may be a rectangle as shown in fig. 2, and as can be seen from fig. 2, the head-shoulder box may be obtained by connecting a head key point of a head vertex of the target object, a left shoulder key point at the left shoulder joint, and a right shoulder key point at the right shoulder joint. In one example, the head-shoulder frame may have other shapes, such as a polygon, a circle, or other irregular shapes.

The characteristic length of the target area can be used for representing the size information of the target area in the target object, and the size information can reflect the stable characteristics in the target area, so that the characteristic length representing the size information is less influenced by the orientation or the posture of the target object in the frame to be detected, and the numerical value is stable.

The content included in the size information may be flexibly determined according to actual conditions, and is not limited to the following embodiments. In some possible implementations, the size information may include the size of the portions of the target object where certain features are stable, such as the size between the head and shoulder portions and/or the size between the shoulders, etc.; in some possible implementations, the size information may also be a larger value or a smaller value of the sizes of the plurality of feature-stable portions, and the like; in some possible implementations, the size information may further include a size ratio between the plurality of feature-stabilized portions, such as a size ratio between the head-shoulder portions and the shoulders, or a size ratio between the shoulders and the head-shoulder portions.

With the difference of the characterized dimension information, the manner of determining the feature length according to the detection result of the key point may also be changed, which is described in the following embodiments, and will not be expanded herein.

And step S14, determining the depth information of the target object in the frame to be detected according to the characteristic length of the target area.

The information content contained in the depth information can be flexibly determined according to the actual situation, and any information which can reflect the depth situation of the target object in the three-dimensional space can be used as the implementation mode of the depth information. In one possible implementation, the depth information may include a depth distance and/or an offset angle.

The depth distance may be a distance between the target object and the acquisition device, and the acquisition device may be any device that acquires an image of the target object, and in some possible implementations, the acquisition device may be an acquisition device of a still image, such as a camera; in some possible implementations, the capturing device may also be a device that captures dynamic images, such as a video camera or a monitoring camera.

The depth distance may be a distance between the target object and the acquisition device, the distance may be a distance between the target object and the entire acquisition device, or a distance between the target object and a certain device component of the acquisition device, and in some possible implementations, a distance between the target object and an optical center of the acquisition device may be used as the depth distance.

The offset angle may be an offset angle of the target object with respect to the acquisition device, which in one possible implementation may be a spatial angle of the target object with respect to an optical axis of the acquisition device.

Since the characteristic length of the target region has a stable value, the actual length corresponding to the characteristic length can be easily determined as the prior estimated value of the characteristic length. The prior estimation value can be used as reference information, and monocular distance measurement of the frame to be detected is realized by combining the characteristic length, so that the depth information of the frame to be detected is obtained. The method for determining the depth information based on the feature length may be flexibly determined according to an actual situation, and any monocular distance measurement implementation method may be used in the implementation process of step S14, for example, the depth distance may be calculated jointly by combining some relevant parameters of the frame to be detected in the acquisition process according to the feature length and the prior estimation value of the feature length. The specific implementation process of step S14 can be detailed in the following disclosure embodiments, and is not expanded herein.

In one possible implementation, step S12 may include:

and performing key point detection on the target object in the frame to be detected according to the position information of the target object in the reference frame to obtain a key point detection result.

The reference frame may be a video frame located before a frame to be detected in the target video, and the target video may be a video including the frame to be detected. In some possible implementation manners, the reference frame may be a previous frame of a frame to be detected in the target video, and in some possible implementation manners, the reference frame may also be a video frame that is located before the frame to be detected and has a distance from the frame to be detected that does not exceed a preset distance in the target video, where the number of the preset distances may be flexibly determined according to an actual situation, and may be one or more frames at intervals, and the like, which is not limited in the embodiment of the present disclosure.

The reference frame is positioned in front of the frame to be detected, and the distance between the reference frame and the frame to be detected does not exceed the preset distance, so that the position of the target object in the reference frame and the position of the target object in the frame to be detected are likely to be relatively close, under the condition, the position information of the target object in the frame to be detected can be roughly determined according to the position information of the target object in the reference frame, under the condition, the target object in the frame to be detected can be subjected to more targeted key point detection, the detected data volume is relatively small, a more accurate key point detection result can be obtained, and the key point detection efficiency can be improved.

In some possible implementation manners, the manner of performing keypoint detection on the target object in the frame to be detected may be flexibly determined according to the position information of the target object in the reference frame, for example, the keypoint detection may be performed after the frame to be detected is cut according to the position information of the target object in the reference frame, or the keypoint detection may be directly performed on the image area at the corresponding position in the frame to be detected according to the position information of the target object in the reference frame, and the like.

By the aid of the method and the device, more targeted key point detection can be realized on the frame to be detected according to the position information of the target object in the reference frame, and the efficiency and the precision of key point detection are improved, so that the efficiency and the precision of the depth detection method are improved.

In a possible implementation manner, performing a keypoint detection on a target object in a frame to be detected according to position information of the target object in a reference frame to obtain a keypoint detection result, including:

according to the first position of the target object in the reference frame, cutting a frame to be detected to obtain a cutting result;

and carrying out key point detection on the target object in the cutting result to obtain a key point detection result.

The first position may be a position coordinate of the whole target object in the reference frame, for example, in a case where the target object is a human object, the first position may be a position coordinate of a human frame of the target object in the reference frame.

The manner of cropping the frame to be detected according to the first position is also not limited in the embodiments of the present disclosure, and is not limited to the following embodiments of the present disclosure. In a possible implementation manner, a first coordinate of the body frame in the reference frame may be determined according to the first position, a second coordinate of the body frame of the target object in the frame to be detected is determined by combining a corresponding relationship of position coordinates between the reference frame and the frame to be detected, and the frame to be detected is clipped based on the second coordinate to obtain a clipping result.

In some possible implementation manners, a first coordinate of a body frame in a reference frame and a frame length of the body frame may also be determined according to the first position, and in combination with a corresponding relationship between the reference frame and the frame to be detected, a second coordinate of the body frame of the target object in the frame to be detected is determined, and the frame to be detected is clipped based on the second coordinate and the frame length to obtain a clipping result, where the clipping based on the second coordinate and the frame length may be that the position of a clipping endpoint is determined according to the second coordinate, and the frame length determines the length of the clipping result, in one example, the length of the clipping result may be consistent with the frame length, in one example, the length of the clipping result may also be proportional to the frame length, such as N times the frame length, and N may be an arbitrary number not less than 1.

The method for detecting the key points of the target object in the cutting result can be flexibly determined according to the actual situation, and is described in detail in the following disclosure embodiments, which are not expanded first.

Through the embodiment of the disclosure, the target object in the frame to be detected can be preliminarily positioned according to the first position of the target object in the reference frame to obtain the cutting result, and the key point detection is performed based on the cutting result, so that on one hand, the data volume of the detection can be reduced, and the detection efficiency is improved, and on the other hand, the proportion of the target object in the cutting result after cutting is larger, so that the precision of the key point detection can be improved.

acquiring a second position of a target area of the target object in the reference frame;

according to the second position, cutting the frame to be detected to obtain a cutting result;

The second position may be a position coordinate of the target region of the target object in the reference frame, and as described in the above disclosed embodiments, the target region may include the head region and/or the shoulder region, so in one possible implementation, the second position may be a position coordinate of the head-shoulder frame of the target object in the reference frame.

How to determine the second position of the target region in the reference frame may be implemented in a flexible manner according to the actual situation, for example, the implementation may be implemented by performing a head-shoulder frame and/or a key point identification on the reference frame, which is described in the following disclosure embodiments and is not first expanded herein.

The manner of cutting the frame to be detected according to the second position may refer to the manner of cutting the frame to be detected according to the first position, which is not described herein again.

The manner of detecting the keypoints of the target object in the clipping result may be the same as or different from the manner of detecting the keypoints of the clipping result obtained according to the first position, which is described in the following disclosure embodiments and is not first expanded herein.

Since in step S14, the depth information of the target object is determined according to the characteristic length of the target area, in the embodiment of the present disclosure, the key point detection result may be obtained according to the second position where the target area of the target object is located in the reference frame, and this way may focus on the target area more specifically, thereby further reducing the processing amount of data, obtaining the characteristic length of the target area more accurately, and further improving the accuracy and efficiency of depth detection.

In one possible implementation manner, obtaining the second position of the target area of the target object in the reference frame may include:

and identifying the target area in the reference frame through the first neural network to obtain a second position output by the first neural network. And/or the presence of a gas in the gas,

and obtaining a second position of the target area in the reference frame according to the detection result of the key point corresponding to the reference frame.

The first neural network may be any network for determining the second position, and the implementation form thereof is not limited in the embodiment of the present disclosure. In some possible implementations, the first neural network may be a target area detection network for identifying the second location of the target area directly from the reference frame, which may be a fast RCNN detection network in one example; in some possible implementations, the first neural network may also be a keypoint detection network, configured to identify one or more keypoints in the reference frame, and then determine a second position of the target region in the reference frame according to the identified keypoint positions.

In some possible implementations, the reference frame may also be used as a frame to be detected for depth detection, in which case the reference frame may have already undergone keypoint detection and obtained a corresponding keypoint detection result. Therefore, in some possible implementations, the second position of the target region in the reference frame may be obtained according to the detection result of the key point corresponding to the reference frame.

In some possible implementation manners, the keypoint detection may also be directly performed on the reference frame to obtain a keypoint detection result, and the keypoint detection manner may refer to other disclosed embodiments and is not described herein again.

According to the embodiment of the disclosure, the second position of the target area in the reference frame can be flexibly determined in various ways according to the actual situation of the reference frame, so that the flexibility and the universality of depth detection are improved; in some possible implementation manners, when the reference frame before the frame to be detected participates in the over-depth detection, the second position may be determined directly based on an intermediate result obtained by the reference frame in the depth detection, so that repeated calculation of data is reduced, and efficiency and accuracy of the depth detection are improved.

In a possible implementation manner, performing a keypoint detection on the target object in the cropping result to obtain a keypoint detection result may include:

and performing key point detection on the target object in the cutting result through the second neural network to obtain a key point detection result.

The second neural network may be any neural network for implementing the keypoint detection, and the implementation manner of the second neural network is not limited in the embodiment of the present disclosure, wherein in the case that the first neural network may be the keypoint detection network, the second neural network may be the same as or different from the implementation manner of the first neural network.

In some possible implementation manners, the key point detection may also be performed on the target object in the clipping result through a related key point identification algorithm, and the application of which key point identification algorithm is not limited in the embodiment of the present disclosure.

In some possible implementations, the keypoint detection results may include a head keypoint, a left shoulder keypoint, and a right shoulder keypoint. Fig. 3 shows a flowchart of a depth detection method according to an embodiment of the present disclosure, and as shown in the figure, in one possible implementation, step S13 may include:

step S131, a first characteristic length of the target area is obtained according to the distance between the left shoulder key point and the right shoulder key point.

Step S132, a second characteristic length of the target area is obtained according to the distance between the head key point and the shoulder center point, wherein the shoulder center point is a middle point between the left shoulder key point and the right shoulder key point.

Step S133, determining the characteristic length of the target region according to the first characteristic length and/or the second characteristic length.

Wherein the first characteristic length may be a characteristic length reflecting a distance between shoulders of the target object, and in one possible implementation, the first characteristic length may be determined according to a distance between the left shoulder key point and the right shoulder key point.

The second characteristic length may be a characteristic length reflecting a distance between the head and the shoulder of the target object, and in one possible implementation, the second characteristic length may be determined according to a distance between the head key point and the shoulder center point.

The shoulder center point can reflect the center position of the shoulder of the target object, and in a possible implementation mode, the position of the shoulder center point can be determined according to the positions of the left shoulder key point and the right shoulder key point; in a possible implementation manner, the shoulder center point can also be directly used as the detected key point, and is directly obtained from the key point detection result.

In step S133, the method for determining the characteristic length of the target region may be flexibly determined according to the actual situation, and in a possible implementation manner, the larger value of the first characteristic length and the second characteristic length may be used as the characteristic length of the target region; in some possible implementations, the smaller of the first characteristic length and the second characteristic length, or the average of the first characteristic length and the second characteristic length, or the ratio of the first characteristic length and the second characteristic length, etc. may also be used as the characteristic length of the target area.

Through the embodiment of the disclosure, the characteristic length of the target area can be obtained based on the first characteristic length and the second characteristic length which are less interfered by the orientation or posture of the target object, so that under the condition of carrying out depth detection on the frames to be detected collected at any angle, a more accurate depth detection result can be obtained, and the stability, robustness and precision of the depth detection are improved.

In one possible implementation, step S14 may include:

acquiring a preset characteristic length of a target area and a preset device parameter of a collection device;

and determining the depth distance according to the proportional relation between the preset characteristic length and the characteristic length of the target area and the preset equipment parameters.

The preset feature length may be an actual feature length of the target region in a normal case, that is, an a priori estimated value of the feature length in the above-described disclosed embodiment. The preset feature length value can be flexibly changed according to different feature length definitions, and is not limited to the following disclosed embodiments. In one possible implementation, in the case where the characteristic length is a larger value between the first characteristic length and the second characteristic length, the preset characteristic length may be set to 25-40cm, and in one example, the preset characteristic length may be set to 32 cm.

The preset device parameters can be some calibration parameters of the acquisition device itself, and the types and kinds of the parameters contained in the preset device parameters can be flexibly determined according to the actual conditions of the acquisition device. In some possible implementations, the preset device parameters may include an internal reference matrix of the acquisition device, where the internal reference matrix may include one or more focal length parameters of the camera, a principal point position of one or more cameras, and the like.

The method for obtaining the preset device parameter is not limited in the embodiments of the present disclosure, and in some possible implementation manners, the preset device parameter may be directly obtained according to the actual condition of the acquisition device, and in some possible implementation manners, the preset device parameter may also be obtained by calibrating the acquisition device.

Based on the proportional relation between the preset characteristic length and the characteristic length, the proportional relation between the target object and the target object in the actual scene under the normal condition can be determined, and the depth distance of the target object in the actual scene can be determined by combining the preset equipment parameters. The process of calculating the depth distance can be flexibly selected according to actual conditions, and is not limited to the following disclosed embodiments. In one example, the process of determining the depth distance according to the preset feature length, the feature length and the preset device parameter can be represented by the following formulas (1) and (2):

wherein d is the depth distance, C is the preset characteristic length, L is the characteristic length of the target area, f_xAnd f_yAs a camera internal reference matrix

F is a parameter value determined according to the focal length parameter.

Through the embodiment of the disclosure, the depth distance can be simply and conveniently determined by utilizing the proportional relation between the characteristic length and the stable preset characteristic length and combining the preset equipment parameters of the acquisition equipment, the calculation amount of the determination mode is small, the result is accurate, and the precision and the efficiency of depth detection can be improved.

In a possible implementation manner, the method provided by the embodiment of the present disclosure may further include:

acquiring preset equipment parameters of acquisition equipment;

and determining the offset angle according to preset equipment parameters and the key point detection result.

The implementation form and the obtaining mode of the preset device parameter may refer to the above disclosed embodiments, and are not described herein again.

According to preset device parameters and a key point detection result, the mode for determining the offset angle can be flexibly selected, and the method is not limited to the following disclosed embodiments. In some possible implementations, the offset angle may be determined according to preset device parameters and position coordinates of the head-shoulder center point in the key point detection result.

The head-shoulder central point may be the central point of the head-shoulder frame mentioned in the above-mentioned embodiments, and in some possible implementation manners, the position coordinates of the whole head-shoulder frame may be determined according to the position coordinates of the head key point, the left shoulder key point, and the right shoulder key point, and the position coordinates of the head-shoulder central point may be determined based on the position coordinates of the whole head-shoulder frame; in some possible implementation manners, the head-shoulder central point may also be directly used as a key point to be detected, so as to directly obtain the position coordinate of the head-shoulder central point in the key point detection result.

In one example, the process of determining the offset angle according to the preset device parameters and the position coordinates of the head-shoulder center point can be expressed by the following formulas (3) and (4):

wherein, theta_xIs an offset angle in the x-axis direction, theta_yIn the y-axis direction(x, y) is the position coordinate of the center point of the head and shoulder, f_xAnd f_yAs a camera internal reference matrix

Focal length parameter of (1), u₀And v₀Is the principal point position in the camera internal reference matrix K.

Through this disclosed embodiment, can utilize and predetermine the key point testing result that equipment parameter and degree of depth detected in-process obtained, confirm the skew angle simply conveniently, this kind of definite mode need not to acquire extra data, and is convenient for calculate, can promote degree of depth detection's efficiency and convenient degree.

and determining the position of the target object in the three-dimensional space according to the depth information of the target object.

The position of the target object in the three-dimensional space may be a three-dimensional coordinate of the target object in the three-dimensional space. The method for determining the position in the three-dimensional space based on the depth information can be flexibly selected according to actual conditions, and in one possible implementation manner, the two-dimensional coordinates of the target object in the frame to be detected can be determined according to the detection result of the key point of the target object, and the two-dimensional coordinates are combined with the depth distance and/or the offset angle in the depth information, so that the three-dimensional coordinates of the target object in the three-dimensional space are determined.

After the position of the target object in the three-dimensional space is determined, the target object may be subjected to face recognition, living body recognition, route tracking, or applied to scenes such as Virtual Reality (VR) or Augmented Reality (AR) based on the three-dimensional position information. By the aid of the depth information positioning method and device, the target object can be positioned in three dimensions by means of the depth information, and interaction and other operations in various modes with the target object are achieved. For example, in some possible implementations, the distance and the angle between the target object and the smart air conditioner may be determined according to the position of the target object in the three-dimensional space, so as to dynamically adjust the wind direction and/or the wind speed of the smart air conditioner; in some possible implementation manners, the target object may also be positioned in the game scene based on the position of the target object in the three-dimensional space in the AR game platform, so that the man-machine interaction in the AR scene may be realized more truly and naturally.

It is understood that the above-mentioned method embodiments of the present disclosure can be combined with each other to form a combined embodiment without departing from the logic of the principle, which is limited by the space, and the detailed description of the present disclosure is omitted. Those skilled in the art will appreciate that in the above methods of the specific embodiments, the specific order of execution of the steps should be determined by their function and possibly their inherent logic.

In addition, the present disclosure also provides a depth detection apparatus, an electronic device, a computer-readable storage medium, and a program, which can be used to implement any one of the depth detection methods provided by the present disclosure, and the corresponding technical solutions and descriptions and corresponding descriptions in the methods section are not repeated.

Fig. 4 shows a block diagram of a depth detection apparatus according to an embodiment of the present disclosure. As shown in fig. 4, the apparatus 20 includes:

the obtaining module 21 is configured to obtain a frame to be detected, where the frame to be detected includes a target object.

And the key point detection module 22 is configured to perform key point detection on the target object according to the frame to be detected, so as to obtain a key point detection result.

And the characteristic length determining module 23 is configured to determine a characteristic length of a target region in the target object based on the key point detection result, where the target region includes a head region and/or a shoulder region, and the characteristic length is used to represent size information of the target region in the target object.

And the depth detection module 24 is configured to determine depth information of the target object in the frame to be detected according to the characteristic length of the target region.

In one possible implementation, the key point detection module is configured to: and performing key point detection on the target object in the frame to be detected according to the position information of the target object in the reference frame to obtain a key point detection result, wherein the reference frame is a video frame which is positioned in front of the frame to be detected in the target video to which the frame to be detected belongs.

In one possible implementation, the key point detection module is further configured to: according to the first position of the target object in the reference frame, cutting the frame to be detected to obtain a cutting result; and carrying out key point detection on the target object in the cutting result to obtain a key point detection result.

In one possible implementation, the key point detection module is further configured to: acquiring a second position of a target area of the target object in the reference frame; according to the second position, cutting the frame to be detected to obtain a cutting result; and carrying out key point detection on the target object in the cutting result to obtain a key point detection result.

In one possible implementation, the key point detection module is further configured to: identifying a target area in the reference frame through the first neural network to obtain a second position output by the first neural network; and/or obtaining a second position of the target area in the reference frame according to the detection result of the key point corresponding to the reference frame.

In one possible implementation, the key point detection result includes a head key point, a left shoulder key point, and a right shoulder key point; the characteristic length determination module is used for: acquiring a first characteristic length of a target area according to the distance between a left shoulder key point and a right shoulder key point; acquiring a second characteristic length of the target area according to the distance between the head key point and the shoulder center point, wherein the shoulder center point is a middle point between a left shoulder key point and a right shoulder key point; and determining the characteristic length of the target area according to the first characteristic length and/or the second characteristic length.

In one possible implementation, the depth information includes a depth distance, the depth distance includes a distance between the target object and an optical center of the acquisition device, and the acquisition device includes a device for image acquisition of the target object; the depth detection module is used for: acquiring a preset characteristic length of a target area and a preset device parameter of a collection device; and determining the depth distance according to the proportional relation between the preset characteristic length and the characteristic length of the target area and the preset equipment parameters.

In one possible implementation, the depth information includes an offset angle, the offset angle includes a spatial angle of the target object with respect to an optical axis of the acquisition device, and the acquisition device includes a device that performs image acquisition on the target object; the apparatus is further configured to: acquiring preset equipment parameters of acquisition equipment; and determining the offset angle according to preset equipment parameters and the key point detection result.

In some embodiments, functions of or modules included in the apparatus provided in the embodiments of the present disclosure may be used to execute the method described in the above method embodiments, and specific implementation thereof may refer to the description of the above method embodiments, and for brevity, will not be described again here.

Application scenario example

Fig. 5 is a schematic diagram illustrating an application example according to the present disclosure, and as shown in the drawing, the application example of the present disclosure proposes a depth detection method, which may include the following processes:

and step S31, detecting the head and shoulder frames of the human body of the first frame of the target video by using a Faster RCNN neural network to obtain the position of the head and shoulder frames in the first frame.

And step S32, starting from the second frame of the target video, taking the video frame as a frame to be detected, taking the previous frame of the frame to be detected as a reference frame, performing key point detection on the frame to be detected through a key point detection network according to the second position of the head-shoulder frame in the reference frame to obtain the position coordinates of three key points, namely a head key point, a left shoulder key point and a right shoulder key point, and taking the circumscribed rectangles of the three key points as the head-shoulder frame in the frame to be detected.

Step S33, determining a characteristic length L in the frame to be detected, where the characteristic length L may be: the greater of the first characteristic length and the second characteristic length. The first characteristic length may be a length of a line segment between the left shoulder key point and the right shoulder key point, and the second characteristic length may be a length of a line segment between the shoulder center point and the head key point.

Step S34, determining the depth information of the target object according to one or more of the characteristic length L, the preset characteristic length C, the head and shoulder frame center point in the frame to be detected and the camera internal reference matrix K:

in the feature length defined in step S33, the corresponding actual distance may be estimated a priori, and for an adult, the feature length is generally about 32cm, so the preset feature length may be set to be C ═ 32cm, and therefore, in the application example of the present disclosure, the depth distance d may be calculated according to the feature length L obtained in step S33, the preset feature length C, and the camera internal parameter matrix K through formulas (1) and (2) in the above-described disclosed embodiment;

in one example, the offset angle in the depth information may also be calculated according to the position (x, y) of the center point of the head and shoulder frame in the frame to be detected and the camera internal reference matrix K through formulas (3) and (4) in the above-described disclosed embodiment.

In one example, after the depth information of the target object in the frame to be detected is determined through step S34, the next frame of the frame to be detected in the target video may also be taken as the frame to be detected, and the process returns to step S32 to perform depth detection again.

Through the application example of the depth estimation method, the characteristic length defined by 3 key points based on the top of the head, the left shoulder and the right shoulder can be used as the basis of depth estimation, the characteristic length is less interfered by the orientation and the posture of a human body, and under the complex scenes that a target object faces a lens on one side, faces away from the lens or is partially shielded and the like, the depth detection can be realized more accurately and robustly, more applicable scenes are obtained, and the distance measurement result is more stable.

It is understood that the above-mentioned method embodiments of the present disclosure can be combined with each other to form a combined embodiment without departing from the logic of the principle, which is limited by the space, and the detailed description of the present disclosure is omitted.

It will be understood by those skilled in the art that in the method of the present invention, the order of writing the steps does not imply a strict order of execution and any limitations on the implementation, and the specific order of execution of the steps should be determined by their function and possible inherent logic.

Embodiments of the present disclosure also provide a computer-readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the above-mentioned method. The computer readable storage medium may be a non-volatile computer readable storage medium.

An embodiment of the present disclosure further provides an electronic device, including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to invoke the memory-stored instructions to perform the above-described method.

The embodiments of the present disclosure also provide a computer program product, which includes computer readable code, and when the computer readable code runs on a device, a processor in the device executes instructions for implementing the depth detection method provided in any one of the above embodiments.

The embodiments of the present disclosure also provide another computer program product for storing computer readable instructions, which when executed cause a computer to perform the operations of the depth detection method provided in any of the above embodiments.

The electronic device may be provided as a terminal, server, or other form of device.

Fig. 6 illustrates a block diagram of an electronic device in accordance with an embodiment of the disclosure. As shown in fig. 6, the electronic device 800 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, or the like terminal.

Referring to fig. 6, electronic device 800 may include one or more of the following components: processing component 802, memory 804, power component 806, multimedia component 808, audio component 810, input/output (I/O) interface 812, sensor component 814, and communication component 816.

The processing component 802 generally controls overall operation of the electronic device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components 802 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.

The memory 804 is configured to store various types of data to support operations at the electronic device 800. Examples of such data include instructions for any application or method operating on the electronic device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

The power supply component 806 provides power to the various components of the electronic device 800. The power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the electronic device 800.

The multimedia component 808 includes a screen that provides an output interface between the electronic device 800 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the electronic device 800 is in an operation mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the electronic device 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.

The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the electronic device 800. For example, the sensor assembly 814 may detect an open/closed state of the electronic device 800, the relative positioning of components, such as a display and keypad of the electronic device 800, the sensor assembly 814 may also detect a change in the position of the electronic device 800 or a component of the electronic device 800, the presence or absence of user contact with the electronic device 800, orientation or acceleration/deceleration of the electronic device 800, and a change in the temperature of the electronic device 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 814 may also include a light sensor, such as a Complementary Metal Oxide Semiconductor (CMOS) or Charge Coupled Device (CCD) image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 816 is configured to facilitate wired or wireless communication between the electronic device 800 and other devices. The electronic device 800 may access a wireless network based on a communication standard, such as a wireless network (WiFi), a second generation mobile communication technology (2G) or a third generation mobile communication technology (3G), or a combination thereof. In an exemplary embodiment, the communication component 816 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the electronic device 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

In an exemplary embodiment, a non-transitory computer-readable storage medium, such as the memory 804, is also provided that includes computer program instructions executable by the processor 820 of the electronic device 800 to perform the above-described methods.

Fig. 7 shows a block diagram of an electronic device according to an embodiment of the disclosure. As shown in fig. 7, the electronic device 1900 may be provided as a server. Referring to fig. 7, electronic device 1900 includes a processing component 1922 further including one or more processors and memory resources, represented by memory 1932, for storing instructions, e.g., applications, executable by processing component 1922. The application programs stored in memory 1932 may include one or more modules that each correspond to a set of instructions. Further, the processing component 1922 is configured to execute instructions to perform the above-described method.

The electronic device 1900 may also include a power component 1926 configured to perform power management of the electronic device 1900, a wired or wireless network interface 1950 configured to connect the electronic device 1900 to a network, and an input/output (I/O) interface 1958. The electronic device 1900 may operate based on an operating system stored in memory 1932, such as a Microsoft Windows ServerOperating system (Windows Server)^TM) Apple Inc. of the present application based on the graphic user interface operating System (Mac OS X)^TM) Multi-user, multi-process computer operating system (Unix)^TM) Free and open native code Unix-like operating System (Linux)^TM) Open native code Unix-like operating System (FreeBSD)^TM) Or the like.

In an exemplary embodiment, a non-transitory computer readable storage medium, such as the memory 1932, is also provided that includes computer program instructions executable by the processing component 1922 of the electronic device 1900 to perform the above-described methods.

The present disclosure may be systems, methods, and/or computer program products. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied thereon for causing a processor to implement various aspects of the present disclosure.

The computer readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but is not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or electrical signals transmitted through electrical wires.

The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.

The computer program instructions for carrying out operations of the present disclosure may be assembler instructions, Instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, the electronic circuitry that can execute the computer-readable program instructions implements aspects of the present disclosure by utilizing the state information of the computer-readable program instructions to personalize the electronic circuitry, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA).

Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The computer program product may be embodied in hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied in a computer storage medium, and in another alternative embodiment, the computer program product is embodied in a Software product, such as a Software Development Kit (SDK), or the like.

Having described embodiments of the present disclosure, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. A depth detection method, comprising:

acquiring a frame to be detected, wherein the frame to be detected comprises a target object;

performing key point detection on the target object according to the frame to be detected to obtain a key point detection result;

determining a characteristic length of a target region in the target object based on the key point detection result, wherein the target region comprises a head region and/or a shoulder region, and the characteristic length is used for representing size information of the target region in the target object;

and determining the depth information of the target object in the frame to be detected according to the characteristic length of the target area.

2. The method according to claim 1, wherein the performing, according to the frame to be detected, the keypoint detection on the target object to obtain a keypoint detection result comprises:

and performing key point detection on the target object in the frame to be detected according to the position information of the target object in a reference frame to obtain a key point detection result, wherein the reference frame is a video frame which is positioned before the frame to be detected in the target video to which the frame to be detected belongs.

3. The method according to claim 2, wherein the performing, according to the position information of the target object in the reference frame, the keypoint detection on the target object in the frame to be detected to obtain a keypoint detection result comprises:

according to the first position of the target object in the reference frame, cutting the frame to be detected to obtain a cutting result;

4. The method according to claim 2 or 3, wherein the performing the keypoint detection on the target object in the frame to be detected according to the position information of the target object in the reference frame to obtain a keypoint detection result comprises:

according to the second position, the frame to be detected is cut to obtain a cutting result;

5. The method of claim 4, wherein obtaining the second position of the target region of the target object in the reference frame comprises:

identifying a target area in the reference frame through a first neural network to obtain a second position output by the first neural network; and/or the presence of a gas in the gas,

6. The method according to any one of claims 1 to 5, wherein the keypoint detection results comprise a head keypoint, a left shoulder keypoint, and a right shoulder keypoint;

the determining the characteristic length of the target region in the target object based on the key point detection result includes:

acquiring a first characteristic length of the target area according to the distance between the left shoulder key point and the right shoulder key point;

acquiring a second characteristic length of the target area according to the distance between the head key point and a shoulder central point, wherein the shoulder central point is a middle point between the left shoulder key point and the right shoulder key point;

and determining the characteristic length of the target area according to the first characteristic length and/or the second characteristic length.

7. The method according to any one of claims 1 to 6, wherein the depth information comprises a depth distance comprising a distance between the target object and an optical center of an acquisition device comprising a device for image acquisition of the target object;

the determining the depth information of the target object in the frame to be detected according to the characteristic length of the target area includes:

acquiring a preset characteristic length of the target area and a preset device parameter of the acquisition device;

and determining the depth distance according to the proportional relation between the preset characteristic length and the characteristic length of the target area and the preset equipment parameter.

8. The method according to any one of claims 1 to 7, wherein the depth information comprises an offset angle comprising a spatial angle of the target object with respect to an optical axis of an acquisition device comprising a device for image acquisition of the target object;

the method further comprises the following steps:

acquiring preset equipment parameters of the acquisition equipment;

and determining the offset angle according to the preset equipment parameters and the key point detection result.

9. The method according to any one of claims 1 to 8, further comprising:

10. A depth detection device, comprising:

the device comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a frame to be detected, and the frame to be detected comprises a target object;

the key point detection module is used for detecting key points of the target object according to the frame to be detected to obtain a key point detection result;

a characteristic length determination module, configured to determine a characteristic length of a target region in the target object based on the keypoint detection result, where the target region includes a head region and/or a shoulder region, and the characteristic length is used to characterize size information of the target region in the target object;

and the depth detection module is used for determining the depth information of the target object in the frame to be detected according to the characteristic length of the target area.

11. An electronic device, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to invoke the memory-stored instructions to perform the method of any of claims 1 to 9.

12. A computer readable storage medium having computer program instructions stored thereon, which when executed by a processor implement the method of any one of claims 1 to 9.