CN115345919B

CN115345919B - Depth determination method and device, electronic equipment and storage medium

Info

Publication number: CN115345919B
Application number: CN202211023306.XA
Authority: CN
Inventors: 段由
Original assignee: Beijing Elite Road Technology Co ltd
Current assignee: Beijing Elite Road Technology Co ltd
Priority date: 2022-08-25
Filing date: 2022-08-25
Publication date: 2024-04-12
Anticipated expiration: 2042-08-25
Also published as: CN115345919A

Abstract

The disclosure provides a depth determination method, a depth determination device, electronic equipment and a storage medium, and relates to the field of artificial intelligence, in particular to the field of intelligent transportation, the field of automatic driving, the field of intelligent parking and the like. The specific implementation scheme is as follows: performing target detection on the original image, and determining a target detection frame of a target object; determining a first region in the target detection frame; the center point of the first area coincides with the center point of the target detection frame, and the ratio of the area of the first area to the area of the target detection frame is greater than or equal to a preset threshold; acquiring the depth of each pixel in the first region; the depth of the target object is determined using the depth of each pixel in the first region. The present disclosure is capable of determining the depth of a target object.

Description

Depth determination method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of artificial intelligence, and more particularly, to the field of intelligent transportation, autopilot, smart parking, etc.

Background

Depth estimation refers to determining the distance (i.e., depth) from an image acquisition device to points in a scene, which can directly reflect the geometry of the object surface in the scene. With the development of three-dimensional (Three Dimensional, 3D) technology, depth estimation has a wide range of application requirements.

Disclosure of Invention

The present disclosure provides a depth determination method, apparatus, device, and storage medium.

According to an aspect of the present disclosure, there is provided a depth determination method including:

performing target detection on the original image, and determining a target detection frame of a target object;

determining a first region in the target detection frame; the center point of the first area coincides with the center point of the target detection frame, and the ratio of the area of the first area to the area of the target detection frame is greater than or equal to a preset threshold;

acquiring the depth of each pixel in the first region;

the depth of the target object is determined using the depth of each pixel in the first region.

According to another aspect of the present disclosure, there is provided a depth determining apparatus including:

the target detection module is used for carrying out target detection on the original image and determining a target detection frame of a target object;

the area determining module is used for determining a first area in the target detection frame; the center point of the first area coincides with the center point of the target detection frame, and the ratio of the area of the first area to the area of the target detection frame is greater than or equal to a preset threshold;

an acquisition module, configured to acquire a depth of each pixel in the first area;

and the depth determining module is used for determining the depth of the target object by utilizing the depth of each pixel in the first area.

According to another aspect of the present disclosure, there is provided an electronic device including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of the embodiments of the present disclosure.

According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform a method according to any one of the embodiments of the present disclosure.

According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements a method according to any of the embodiments of the present disclosure.

The depth of the target object is determined by utilizing the depth of each pixel in the first area in the target detection frame, and the overall depth of the target object can be determined.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a schematic diagram of one system 100 architecture to which the depth determination method of embodiments of the present disclosure may be applied;

FIG. 2 is a flow chart of an implementation of a depth determination method 200 according to an embodiment of the present disclosure;

FIG. 3A is a schematic illustration of an original image and a target detection frame according to an embodiment of the present disclosure;

FIG. 3B is a schematic illustration of a profile of a target object determined using a mask map in accordance with an embodiment of the present disclosure;

FIG. 4A is a schematic illustration of a first region in accordance with an embodiment of the present disclosure;

FIG. 4B is a schematic diagram II of a first region in an embodiment of the present disclosure;

FIG. 4C is a schematic diagram III of a first region in an embodiment in accordance with the disclosure;

FIG. 4D is a schematic diagram fourth of a first region in an embodiment in accordance with the disclosure;

FIG. 5 is a flow chart of a method 500 for determining the depth of each pixel in a first region according to an embodiment of the disclosure;

FIG. 6 is a schematic diagram of the effect of determining the depth of a target object according to an embodiment of the present disclosure;

FIG. 7 is a schematic diagram of a depth prediction model according to an embodiment of the present disclosure;

FIG. 8 is a schematic diagram of a depth determination device 800 according to an embodiment of the present disclosure;

FIG. 9 is a schematic diagram of a depth determination device 900 according to an embodiment of the present disclosure;

fig. 10 shows a schematic block diagram of an example electronic device 1000 that may be used to implement embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Furthermore, numerous specific details are set forth in the following detailed description in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements, and circuits well known to those skilled in the art have not been described in detail in order not to obscure the present disclosure.

In the related art, depth estimation has a wide application requirement. Depth estimation refers to determining the distance (i.e. depth) from an image acquisition device to a point in a scene, which can directly reflect the geometry of the object surface in the scene. For example, in the context of autopilot, intelligent transportation, smart parking, etc., it is desirable to determine the geometry of the surface of a target object (e.g., a vehicle, pedestrian, etc.) and the actual location of each location.

The existing depth estimation method mainly comprises the following steps:

1. the depth is determined using a lidar. Specifically, a laser pulse is emitted by the laser, the time of emission is recorded by the timer, the returned laser pulse is received by the receiver, and the time of return is recorded by the timer. The two times are subtracted to obtain the "time of flight" of the light, and the speed of light is constant, so that the distance can be calculated after the speed and time are known.

2. Depth was determined using a binocular camera. The scheme needs to calibrate the camera to obtain the internal and external parameters of the camera. By calculating the parallax of the two images, the distance measurement is directly performed on the front scene (the range where the images are captured) without judging what type of obstacle appears in front. The principle of a binocular camera is similar to that of a human eye. The human eye is able to perceive the distance of an object due to the difference in the images presented by both eyes to the same object, also known as "parallax". The farther the object is, the smaller the parallax; conversely, the greater the parallax.

3. Depth is determined using a monocular camera and a depth algorithm. Specifically, a monocular camera shoots a scene image, a depth algorithm is adopted to carry out depth estimation on the scene image, and the depth of each pixel in the scene image is obtained and can be regarded as the actual distance between the actual position corresponding to the pixel and the lens of the image acquisition device. The depth algorithm may be implemented by a pre-trained depth detection model.

The above approaches each have disadvantages: in the first mode, the laser radar is high in cost, power consumption and failure rate. In the second mode, a large amount of parameter calibration work is required for the binocular camera, so that a large amount of operation and maintenance personnel are required. In the third mode, in order to determine the depth of each position in the scene, a processing device running the depth detection model is required to have a strong computational power and also to have higher power consumption.

However, in some specific scenarios, it is not necessary to accurately identify the depth of each portion of the target object, but rather only to estimate the overall depth of the target object. For example, in a parking scenario, since the vehicle is typically stationary and typically in a fixed parking space, such a scenario does not require accurate identification of the depth of various parts of the target object, as compared to an autopilot scenario or a driving scenario, but only the overall depth of the target object.

Aiming at the scene with low requirements on depth recognition accuracy, the embodiment of the disclosure provides a depth determination method. Fig. 1 is a schematic diagram of a system 100 architecture to which the depth determination method of embodiments of the present disclosure may be applied. As shown in fig. 1, the system architecture includes: an image acquisition device 110, a network 120 and a depth determination device 130. The image acquisition device 110 and the depth determination device 130 may establish a communication connection image through the network 120. The image pickup device 110 transmits the original image to the depth determination device 130 through the network 120, and the depth determination device 130 determines the depth of the target object in the original image in response to the received original image. Finally, the depth determining device 130 returns the determined depth of the target object to the image acquisition device, or sends the determined depth to other servers or terminal devices. The depth determination means 130 may comprise a visual processing device or a remote server with depth estimation capabilities. The network 120 may be wired or wireless. Wherein, when the depth determining apparatus 130 is a vision processing device, the image capturing apparatus 110 may be communicatively connected to the vision processing device by a wired connection, for example, perform data communication through a bus; when the depth determining apparatus 130 is a remote server, the image capturing apparatus 110 may perform data interaction with the remote server through a wireless network. The image pickup device 110 may be an in-vehicle image pickup device, an intelligent traffic image pickup device, or the like.

The parking scene is a broad name, and includes a parking lot, a temporary parking area on a road side, a scene where vehicles are parked in an exhibition hall, a vehicle which tends to be stationary in the vicinity of a traffic light, and the like. Moreover, the application scene of the embodiment of the disclosure is not limited to the parking scene, and the method can be applied to the scene as long as the depth accuracy determined by the method meets the scene requirement. For example, embodiments of the present disclosure may also be applied to warehouse, dock, etc. scenarios, for determining the location of goods, etc.

Fig. 2 is a flow chart of an implementation of a depth determination method 200 according to an embodiment of the present disclosure. In some embodiments of the present disclosure, the depth determination method may be performed by a terminal device, a server, or other processing device. In some embodiments of the present disclosure, the depth determination method may be implemented by way of a processor invoking computer readable instructions stored in a memory. As shown in fig. 2, the depth determination method includes the steps of:

s210: performing target detection on the original image, and determining a target detection frame of a target object;

s220: determining a first region in the target detection frame; the center point of the first area coincides with the center point of the target detection frame, and the ratio of the area of the first area to the area of the target detection frame is greater than or equal to a preset threshold value;

s230: acquiring the depth of each pixel in the first region;

s240: the depth of the target object is determined using the depth of each pixel in the first region.

The original image may be an image in a parking scene, for example, may be image data acquired by a vehicle-mounted image capturing device, an intelligent traffic image capturing device, or the like. The original image may be a still picture, a dynamic video image, or a video frame extracted from the video image, and the embodiment of the present disclosure does not limit the form and the acquisition path of the original image.

In some examples, the above-mentioned preset threshold may be set according to actual conditions, for example, the preset threshold is set to 50%.

In the disclosed embodiment, object Detection (Object Detection) refers to finding a target Object of interest in an original image and determining the position and class of the target Object. Specifically, the target detection frame for describing the spatial position of the target object is output through the target detection network, for example, information of the size, position, category, and the like of the target detection frame of the target object is output.

In some embodiments, the target objects may be vehicles and pedestrians in a parking scene, goods in a warehouse environment, furniture and electrical appliances in a home scene, office tables and chairs in an office scene, or the like. The implementation of the method does not limit the types of the target objects, and the method can be applicable to the target objects as long as the accuracy of the depth determined by the target objects by adopting the method meets the scene requirements.

In some embodiments, determining the depth of the target object using the depth of each pixel in the first region in step S240 may include: an average value of the depths of all pixels in the first region is calculated, and the average value is taken as the depth of the target object.

Other ways of determining the depth of the target object using the depth of each pixel in the first region are also possible in embodiments of the present disclosure. For example, the median of the respective pixels in the first region is taken as the depth of the target object.

The depth determining method provided by the embodiment of the disclosure can determine the depth of the target object according to the depths of a plurality of pixels of the target object in the first area of the original image; since the depth of a plurality of pixels is employed to determine the overall depth of the target object, the accuracy requirement for the depth of a single pixel is not high, and the depth of the target object can be confirmed at a lower cost and with lower power consumption. For example, in the case where the depth accuracy of a single pixel is not high, in the depth detection result, the depth of some pixels is higher than the actual depth, the depth of some pixels is lower than the actual depth, and the probabilities and/or the deviation degrees of the higher and lower are in the form of random distribution; if the average value of a plurality of pixels is taken as the depth of the target object, a large number of pixel deviations which are higher or lower are counteracted during the averaging according to the mathematical principle, and the accuracy of the finally obtained depth average value (i.e. the depth of the target object) can be ensured.

How to determine the first region, i.e., which regions of pixel depth to use to calculate the depth of the target object, is an important consideration and solution for embodiments of the present disclosure. The problem is analyzed as follows.

First, whether the target detection frame is suitable as the first region is analyzed. The target detection frame is a rectangular frame which limits a target object in the original image; since the target object is displayed as an irregular shape in the original image, the target detection frame contains other objects in addition to the target object. Taking fig. 3A as an example, fig. 3A is a schematic diagram of an original image and a target detection frame, and a rectangular frame located in the middle of fig. 3A is a target detection frame of a target object (such as a vehicle in fig. 3A) in the original image. If the target detection frame is used as the first area, the depth of each pixel in the target detection frame is used to determine the overall depth of the vehicle, and obviously, the determined depth of the vehicle is not accurate enough because the target detection frame contains pixels which do not belong to the vehicle.

Or, if a region defined by the outline of the target object in the original image is taken as a first region, since all pixels of the target object are contained in the first region and pixels of any other object are not contained; thus, it is obviously accurate to determine the depth value of the target object using the depth values of the individual pixels in the area delineated by the outline of the target object. However, the process of determining the contour of the target object requires a lot of time and computational costs. Fig. 3B is a schematic diagram of the outline of the target object determined by using the mask pattern method. In addition to the mask pattern approach, there are other ways to determine the outline of the target object, such as using an example segmentation approach, etc. In any case, the time and computation costs are increased.

In view of the above analysis, the first region for determining the depth of the target object according to the embodiments of the present disclosure has the following features:

1. a first region is determined in the target detection frame, i.e. the first region is inside the target detection frame. This is because all pixels of the target object are within the target detection frame.

2. The center point of the first region coincides with the center point of the target detection frame. Since the target object is located in the middle of the target detection frame, and the center point of the first area coincides with the center point of the target detection frame, it is ensured that the target object is also located in the middle of the first area, so that most of pixels in the first area are pixels of the target object.

3. The ratio of the area of the first region to the area of the target detection frame is greater than or equal to a preset threshold. This is to ensure that the first region is able to contain a large fraction of the pixels of the target object. The preset threshold may be set according to practical situations, for example, set to 50%.

With the characteristics, the depth of each pixel in the first area can be ensured, and the depth of the target object can be accurately determined. Further, since the first region is of a fixed shape and is at a fixed position in the target detection frame, the pixels included in the first region can be easily specified. Therefore, the depth determination method provided by the embodiment of the disclosure not only can ensure accurate estimation of the depth of the target object, but also can reduce the consumption of time and calculation cost and improve the speed.

After determining the above-mentioned characteristics of the first region, it is then analyzed which regions are suitable as the first region.

Fig. 4A-4D are schematic diagrams of a first region in an embodiment of the present disclosure. It should be noted that the images shown in fig. 4A to 4D are images within the target detection frame in the original image, not the original image.

As shown in fig. 4A, the target detection frame is rectangular in shape;

the shape of the first region may be diamond or square, and the 4 vertices of the first region are located at midpoints of 4 sides of the target detection frame, respectively. The ratio of the area of the first region to the area of the target detection frame is 50%.

Taking fig. 4A as an example, in the case where the target detection frame is rectangular, the shape of the first region is diamond. In the case where the target detection frame is square, the shape of the first region is square.

As can be seen from fig. 4A, most of the pixels inside the first region belong to the target object (e.g., the vehicle in fig. 4A), and most of the pixels in the target object are within the first region. Through experimental statistics, in the first area shown in fig. 4A, the pixels of the target object account for 83% of all the pixels, and therefore, the pixels in the first area can reflect the depth condition of the target object to a great extent.

As shown in fig. 4B, the target detection frame is rectangular in shape;

the shape of the first region may be circular or elliptical.

Taking fig. 4B as an example, in the case where the target detection frame is rectangular, the shape of the first region is elliptical, and 4 vertices of the first region are located at midpoints of 4 sides of the target detection frame, respectively. In the case where the target detection frame is square, and the 4 sides of the target detection frame are all tangent lines of the first region. In the example of fig. 4B, the ratio of the area of the first region to the target detection frame area is approximately 80%.

As can be seen in fig. 4B, most of the pixels in the first region belong to the target object (e.g., the vehicle of fig. 4B), and most of the pixels in the target object are in the first region. Through experimental statistics, in the first area shown in fig. 4B, the pixels of the target object occupy 79% of all the pixels, and therefore, the pixels in the first area can reflect the depth condition of the target object to a great extent.

As shown in fig. 4C, the target detection frame is rectangular in shape;

the shape of the first region may be polygonal, and each vertex of the first region is located on a side of the target detection frame.

Taking fig. 4C as an example, in the case where the target detection frame is rectangular, the shape of the first region is a regular hexagon. In the example of fig. 4C, the ratio of the area of the first region to the target detection frame area is approximately 75%.

As can be seen in fig. 4C, most of the pixels in the first region belong to the target object (e.g., the vehicle of fig. 4C), and most of the pixels in the target object are in the first region. Through experimental statistics, in the first area shown in fig. 4C, the pixels of the target object occupy 76% of all the pixels, and thus, the pixels in the first area can reflect the depth condition of the target object to a great extent.

As shown in fig. 4D, the shape of the target detection frame is rectangular;

the first region may be irregularly shaped and include midpoints of respective sides of the target detection frame.

Taking fig. 4D as an example, in the case where the target detection frame is rectangular, the shape of the first region is cross-shaped. As shown in fig. 4D, the 4 sides of the target detection frame overlap with the 4 sides of the cross-shaped 12 sides, respectively, and the widths of the 4 sides are each one third of the length of the corresponding sides of the target detection frame. In the example of FIG. 4D, the ratio of the first area to the target detection frame area is 5/9.

As can be seen in fig. 4D, most of the pixels in the first region belong to the target object (e.g., the vehicle of fig. 4D), and most of the pixels in the target object are in the first region. Through experimental statistics, in the first area shown in fig. 4D, the pixels of the target object occupy 77% of all the pixels, so the pixels in the first area can reflect the depth condition of the target object to a great extent.

The embodiment of the disclosure is particularly suitable for high-order shooting scenes. In such a scenario, the image capture device is taller than the target object (e.g., a vehicle). Taking a high-level parking scene as an example, as shown in fig. 4A-4D, most of vehicles in the image face to the left or right by a certain angle, and each area such as a roof, a headstock, a vehicle body and the like is included, and the depth of the whole vehicle body can be more accurately determined by adopting the pixel depths of the positions.

In the embodiments of the present disclosure, the first region determined based on the center point of the target detection frame can encompass all or most of the target object. The depth of the target object is determined by utilizing the depth of each pixel in the first area, so that the depth of the real target object can be reflected to the greatest extent with smaller calculation force and faster speed. The method for determining the depth of the target object by utilizing the depth of each pixel in the first area can also remove the influence of a large amount of background.

Fig. 5 is a flow chart of a method 500 for determining depth of each pixel in a first region according to an embodiment of the disclosure, including:

s510: inputting the original image into a pre-trained depth prediction model to obtain the depth of each pixel in the original image;

s520: and determining the depth of each pixel in the first area by using the depth of each pixel in the original image, the position of the target detection frame in the original image and the position of the first area in the target detection frame.

Fig. 6 is an effect diagram of determining a depth of a target object according to an embodiment of the present disclosure. As shown in fig. 6, a target detection frame of the target object is determined in the original image, and the first area is a diamond-shaped area within the detection frame. According to the position of the target detection frame (such as the coordinates of the center point, the width and height of the target detection frame and the like) and the position of the first region in the target detection width, the position of the first region in the whole original image can be determined; and determining the depth of each pixel in the first area according to the depth of each pixel in the original image. Then, the depth of the target object may be determined using the depths of the pixels in the first region.

In the embodiment of the disclosure, the depth of each pixel in the original image can be obtained by using the depth map output by the depth prediction model. Fig. 7 is a schematic structural diagram of a depth prediction model according to an embodiment of the disclosure, as shown in fig. 7, including:

1. inputting an original image with the size H multiplied by W multiplied by C, wherein H is the height of the original image, W is the width of the original image, and C is the channel number of the convolution kernel of the original image;

2. the original image is subjected to convolution downsampling operation for n times to obtainWherein n is any positive integer;

3. the obtained feature map is subjected to convolution up-sampling operation for n times, and then feature fusion operation is carried out to obtain the dimension H multiplied by W multiplied by 2 ⁿ Is a feature map of (1);

4. and carrying out 1×1 convolution operation on the obtained feature map to obtain a depth map with the size of H×W×1.

In the example shown in fig. 7, description is given taking an example in which the up-sampling number and the down-sampling number are both 4. Wherein, the coding layer of the depth prediction model shown in fig. 7 includes 4 downsampling, and for clarity of illustration, the specific process of 4 downsampling is not shown in the figure. In the embodiment of the disclosure, the sampling frequency is not limited under the condition that the accuracy of the depth map output by the depth prediction model reaches the preset condition. The size of the original image input to the depth prediction model is not limited, and for example, the size of the original image input (length of original image×width of original image) may be 768×448 or 1920×1080.

The embodiment of the present disclosure further proposes a depth determining apparatus, and fig. 8 is a schematic structural diagram of a depth determining apparatus 800 according to an embodiment of the present disclosure, including:

the target detection module 810 is configured to perform target detection on the original image, and determine a target detection frame of the target object;

a region determination module 820 for determining a first region in the target detection frame; the center point of the first area coincides with the center point of the target detection frame, and the ratio of the area of the first area to the area of the target detection frame is greater than or equal to a preset threshold;

an acquiring module 830, configured to acquire a depth of each pixel in the first area;

a depth determination module 840 is configured to determine a depth of the target object using the depths of the pixels in the first region.

In some embodiments, the target detection frame is rectangular in shape;

the first region is diamond or square in shape, and 4 vertexes of the first region are respectively positioned at midpoints of 4 sides of the target detection frame.

In some embodiments, the target detection frame is rectangular in shape;

the first region is elliptical in shape, and 4 vertices of the first region are located at midpoints of 4 sides of the target detection frame, respectively.

In some embodiments, the target detection frame is square in shape;

the first region is circular in shape, and 4 sides of the target detection frame are tangent lines of the first region.

In some embodiments, the target detection frame is rectangular in shape;

the first region is polygonal in shape, and each vertex of the first region is located on a side of the target detection frame.

In some embodiments, the target detection frame is rectangular in shape;

the first region is irregularly shaped and includes midpoints of respective sides of the target detection frame.

In some embodiments, the depth determination module is configured to calculate an average value of the depths of all pixels in the first region, and take the average value as the depth of the target object.

Fig. 9 is a schematic structural view of a depth determining apparatus 900 according to an embodiment of the present disclosure. As shown in fig. 9, the depth determining apparatus 900 includes a target detection module 910, a region determining module 920, an acquisition module 930, and a depth determining module 940. In some embodiments, the acquisition module 930 includes:

an input sub-module 931, configured to input the original image into a pre-trained depth prediction model, to obtain a depth of each pixel in the original image;

a pixel depth determination submodule 932 is configured to determine a depth of each pixel in the first region using a depth of each pixel in the original image, a position of the target detection frame in the original image, and a position of the first region in the target detection frame.

For descriptions of specific functions and examples of each module and sub-module of the apparatus in the embodiments of the present disclosure, reference may be made to the related descriptions of corresponding steps in the foregoing method embodiments, which are not repeated herein.

In the technical scheme of the disclosure, the acquisition, storage, application and the like of the related user personal information all conform to the regulations of related laws and regulations, and the public sequence is not violated.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.

Fig. 10 shows a schematic block diagram of an example electronic device 1000 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 10, the apparatus 1000 includes a computing unit 1001 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 1002 or a computer program loaded from a storage unit 1008 into a Random Access Memory (RAM) 1003. In the RAM 1003, various programs and data required for the operation of the device 1000 can also be stored. The computing unit 1001, the ROM 1002, and the RAM 1003 are connected to each other by a bus 1004. An input/output (I/O) interface 1005 is also connected to bus 1004.

Various components in device 1000 are connected to I/O interface 1005, including: an input unit 1006 such as a keyboard, a mouse, and the like; an output unit 1007 such as various types of displays, speakers, and the like; a storage unit 1008 such as a magnetic disk, an optical disk, or the like; and communication unit 1009 such as a network card, modem, wireless communication transceiver, etc. Communication unit 1009 allows device 1000 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunications networks.

The computing unit 1001 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 1001 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 1001 performs the respective methods and processes described above, for example, the depth determination method. For example, in some embodiments, the depth determination method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 1008. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 1000 via ROM 1002 and/or communication unit 1009. When a computer program is loaded into RAM 1003 and executed by computing unit 1001, one or more steps of the depth determination method described above may be performed. Alternatively, in other embodiments, the computing unit 1001 may be configured to perform the depth determination method by any other suitable means (e.g. by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel, sequentially, or in a different order, provided that the desired results of the disclosed aspects are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A depth determination method, comprising:

performing target detection on the original image, and determining a target detection frame of a target object; the target detection frame is used for describing the space position of the target object;

determining a first area in the target detection frame; the center point of the first area coincides with the center point of the target detection frame, and the ratio of the area of the first area to the area of the target detection frame is greater than or equal to a preset threshold; the preset threshold value is 50%;

acquiring the depth of each pixel in the first region;

and calculating an average value of the depths of all pixels in the first area, and taking the average value as the depth of the target object.

2. The method of claim 1, wherein the target detection frame is rectangular in shape;

the first area is diamond or square in shape, and 4 vertexes of the first area are respectively positioned at midpoints of 4 sides of the target detection frame.

3. The method of claim 1, wherein the target detection frame is rectangular in shape;

the first region is elliptical in shape, and 4 vertexes of the first region are respectively positioned at midpoints of 4 sides of the target detection frame.

4. The method of claim 1, wherein the target detection frame is square in shape;

5. The method of claim 1, wherein the target detection frame is rectangular in shape;

6. The method of claim 1, wherein the target detection frame is rectangular in shape;

7. The method of any of claims 1-6, wherein the acquiring the depth of each pixel in the first region comprises:

inputting the original image into a pre-trained depth prediction model to obtain the depth of each pixel in the original image;

and determining the depth of each pixel in the first area by using the depth of each pixel in the original image, the position of the target detection frame in the original image and the position of the first area in the target detection frame.

8. A depth determining apparatus comprising:

the target detection module is used for carrying out target detection on the original image and determining a target detection frame of a target object; the target detection frame is used for describing the space position of the target object;

the area determining module is used for determining a first area in the target detection frame; the center point of the first area coincides with the center point of the target detection frame, and the ratio of the area of the first area to the area of the target detection frame is greater than or equal to a preset threshold; the preset threshold value is 50%;

and the depth determining module is used for calculating an average value of the depths of all pixels in the first area, and taking the average value as the depth of the target object.

9. The apparatus of claim 8, wherein the target detection frame is rectangular in shape;

10. The apparatus of claim 8, wherein the target detection frame is rectangular in shape;

11. The apparatus of claim 8, wherein the target detection frame is square in shape;

12. The apparatus of claim 8, wherein the target detection frame is rectangular in shape;

13. The apparatus of claim 8, wherein the target detection frame is rectangular in shape;

14. The apparatus of any of claims 8-13, wherein the acquisition module comprises:

the input sub-module is used for inputting the original image into a pre-trained depth prediction model to obtain the depth of each pixel in the original image;

and the pixel depth determining submodule is used for determining the depth of each pixel in the first area by utilizing the depth of each pixel in the original image, the position of the target detection frame in the original image and the position of the first area in the target detection frame.

15. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7.

16. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-7.