WO2022267275A1 - 深度检测方法、装置、设备、存储介质、计算机程序及产品 - Google Patents

深度检测方法、装置、设备、存储介质、计算机程序及产品 Download PDF

Info

Publication number
WO2022267275A1
WO2022267275A1 PCT/CN2021/125278 CN2021125278W WO2022267275A1 WO 2022267275 A1 WO2022267275 A1 WO 2022267275A1 CN 2021125278 W CN2021125278 W CN 2021125278W WO 2022267275 A1 WO2022267275 A1 WO 2022267275A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
target object
coordinate system
feature map
detection frame
Prior art date
Application number
PCT/CN2021/125278
Other languages
English (en)
French (fr)
Inventor
张胤民
马新柱
伊帅
侯军
欧阳万里
Original Assignee
北京市商汤科技开发有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京市商汤科技开发有限公司 filed Critical 北京市商汤科技开发有限公司
Publication of WO2022267275A1 publication Critical patent/WO2022267275A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20068Projection on vertical or horizontal image axis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning

Definitions

  • the present disclosure relates to the technical field of image processing, and in particular to a depth detection method, device, equipment, storage medium, computer program and product.
  • 3D object detection is an important and challenging problem in the field of computer vision, which plays an important role in computer vision applications such as autonomous driving, robotics, augmented or virtual reality.
  • Monocular 3D target detection can use the monocular image acquired by the monocular camera to achieve the purpose of 3D detection of the target object in the bullet chat image.
  • Embodiments of the present disclosure at least provide a depth detection method, device, equipment, storage medium, computer program and product.
  • an embodiment of the present disclosure provides a depth detection method, including: acquiring an image to be processed; based on the image to be processed, determining that the two-dimensional detection frame of the target object is in the image coordinate system corresponding to the image to be processed
  • the two-dimensional position information of the target object in the camera coordinate system corresponding to the image to be processed is the projection position information of the three-dimensional detection frame in the image coordinate system; based on the two-dimensional position information, the projection position information, and projection relationship information between the two-dimensional detection frame and the three-dimensional detection frame, to obtain an intermediate depth value of the center point of the target object in the camera coordinate system; based on the intermediate depth value and In the image to be processed, a target depth value of the center point of the target object in the camera coordinate system is obtained.
  • the determining the two-dimensional position information of the two-dimensional detection frame of the target object in the image coordinate system corresponding to the image to be processed based on the image to be processed includes: Perform feature extraction on the image to obtain a feature map of the image to be processed; based on the feature map, obtain the probability that each feature point in the feature map belongs to the center point of the target object, and the first position offset corresponding to each feature point amount, and the downsampled size information of the downsampled two-dimensional detection frame with each feature point as the center point; based on the probability, the first position offset and the downsampled size information, the two-dimensional position is obtained Information; wherein, the downsampled two-dimensional detection frame is a detection frame formed by limiting and shrinking the two-dimensional detection frame of the target object after downsampling the image to be processed.
  • the two-dimensional position information includes: first coordinate information of a center point of the two-dimensional detection frame in the image coordinate system, and size information of the two-dimensional detection frame.
  • the obtaining the two-dimensional position information based on the probability, the first position offset and the downsampling size information includes: based on each The probability that the feature point belongs to the center point of the target object, determining the target feature point from the feature map; based on the position information of the target feature point in the feature map, the first position offset of the target feature point , and a downsampling rate, determining the first coordinate information of the center point of the two-dimensional detection frame in the image coordinate system; and, based on the downsampling size information corresponding to the target feature point, and the downsampling rate , to determine the size information of the two-dimensional detection frame.
  • the performing feature extraction on the image to be processed and obtaining the feature map of the image to be processed includes: using a pre-trained backbone neural network to perform feature extraction on the image to be processed to obtain the The feature map of the image to be processed; based on the feature map, obtaining the probability that each feature point in the feature map belongs to the center point of the target object includes: using a pre-trained center point prediction neural network to perform a feature map The center point prediction process obtains the probability that each feature point in the feature map belongs to the center point of the target object.
  • the neural network for center point prediction is trained in the following manner: acquire a sample image, and label position information of the center point of the sample object in the sample image; wherein, the center point of the sample object The point is the projection point of the center point of the three-dimensional detection frame of the sample object in the camera coordinate system corresponding to the sample image in the sample image; using the sample image and the position label information, the backbone neural network to be trained network and the central point prediction neural network to be trained to obtain the trained central point prediction neural network.
  • the projection position information of the 3D detection frame of the target object in the camera coordinate system corresponding to the image to be processed in the image coordinate system includes: based on In the feature map of the image to be processed, a second position offset corresponding to each feature point in the feature map is obtained; based on the probability that each feature point in the feature map belongs to the center point of the target object, The second position offset and the downsampling rate are used to obtain projected position information of the three-dimensional detection frame in the image coordinate system.
  • the projection location information includes at least one of the following: second coordinate information of a projection point of a center point of the three-dimensional detection frame in the image coordinate system.
  • the three-dimensional detection frame is obtained based on the probability that each feature point in the feature map belongs to the center point of the target object, the second position offset, and the downsampling rate
  • the projected position information in the image coordinate system includes: determining the target feature point from the feature map based on the probability that each feature point in the feature map belongs to the center point of the target object; based on the target The position information of the feature point in the feature map, the second position offset corresponding to the target feature point, and the downsampling rate determine that the center point of the three-dimensional detection frame is projected in the image coordinate system The second coordinate information of the point.
  • the target object is obtained based on the two-dimensional position information, the projection position information, and the projection relationship information between the two-dimensional detection frame and the three-dimensional detection frame.
  • the intermediate depth value in the camera coordinate system includes: based on the two-dimensional position information, the projected position information, the actual size information of the target object, the orientation information of the target object, and the two-dimensional detection.
  • the projection relationship information between the frame and the three-dimensional detection frame is used to obtain the intermediate depth value of the target object in the camera coordinate system.
  • it further includes: performing size prediction processing on the target object based on the feature map of the image to be processed to obtain actual size information of the target object; and/or, based on the image to be processed For the feature map of the image, perform orientation prediction processing on the target object to obtain orientation information of the target object in the camera coordinate system.
  • the projection relationship information of the two-dimensional detection frame and the three-dimensional detection frame is based on the size information and position information of the projection of the three-dimensional detection frame in the image coordinate system, and the two-dimensional detection frame.
  • the size information and position information of the frame are established.
  • the obtaining the target depth value of the center point of the target object in the camera coordinate system based on the intermediate depth value includes: calculating the center point of the target object in the The depth image formed by the intermediate depth value in the camera coordinate system is nonlinearly transformed to obtain a depth feature map; based on the depth feature map and the feature map of the image to be processed, the center point of the target object is obtained in the Object depth value in camera coordinate system.
  • the obtaining the target depth value of the center point of the target object in the camera coordinate system based on the depth feature map and the feature map of the image to be processed includes: The depth feature map and the feature map of the image to be processed are superimposed to form a target feature map; the depth prediction process is performed on the target feature map using a pre-trained depth value prediction neural network to obtain each feature map in the feature map.
  • the target depth value of the feature point based on the probability that each feature point in the feature map belongs to the center point of the target object and the target depth value corresponding to each feature point, obtain the center point of the target object in the camera Target depth value in coordinate system.
  • the method further includes: based on the target depth value of the center point of the target object in the camera coordinate system and the actual size information of the target object, obtaining 3D inspection results in the coordinate system.
  • an embodiment of the present disclosure further provides a depth detection device, including: an acquisition module configured to acquire an image to be processed; a first processing module configured to determine a two-dimensional detection frame of a target object based on the image to be processed The two-dimensional position information in the image coordinate system corresponding to the image to be processed, and the projection position information of the three-dimensional detection frame of the target object in the camera coordinate system corresponding to the image to be processed in the image coordinate system
  • the second processing module is configured to obtain the center point of the target object based on the two-dimensional position information, the projection position information, and the projection relationship information between the two-dimensional detection frame and the three-dimensional detection frame
  • An intermediate depth value in the camera coordinate system a prediction module configured to obtain a target depth value of the center point of the target object in the camera coordinate system based on the intermediate depth value.
  • an optional implementation manner of the present disclosure further provides a computer device, including a processor and a memory, the memory stores machine-readable instructions executable by the processor, and the processor is configured to execute the Stored machine-readable instructions, when the machine-readable instructions are executed by the processor, implement the above first aspect, or the method in any possible implementation manner of the first aspect.
  • an optional implementation manner of the present disclosure further provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed, the above-mentioned first aspect, or any one of the first aspects in the first aspect, may be executed. Steps in one possible implementation.
  • the present disclosure further provides a computer program, including computer readable codes, when the computer readable codes are run in an electronic device, the processor in the electronic device executes to implement the above first aspect, Or a step in any possible implementation manner in the first aspect.
  • the present disclosure also provides a computer program product, the computer program product includes one or more instructions, and the one or more instructions are suitable for being loaded by a processor and executing the above-mentioned first aspect, or the first aspect Steps in any of the possible implementations.
  • FIG. 1 shows a flow chart of a depth detection method provided by an embodiment of the present disclosure
  • FIG. 2 shows a flow chart of a method for determining the two-dimensional position information of the two-dimensional detection frame of the target object in the image coordinate system corresponding to the image to be processed provided by an embodiment of the present disclosure
  • FIG. 3 shows a flow chart of a method for determining projection position information of a three-dimensional detection frame in an image coordinate system provided by an embodiment of the present disclosure
  • FIG. 4 shows a schematic structural diagram of a target neural network for deep detection provided by an embodiment of the present disclosure
  • FIG. 5 shows a schematic structural diagram of a depth detection device provided by an embodiment of the present disclosure
  • Fig. 6 shows a schematic diagram of a computer device provided by an embodiment of the present disclosure.
  • the neural network is usually trained by using the sample monocular images and the annotation information generated by three-dimensional marking of the target objects in the monocular images.
  • the obtained neural network can directly predict the depth value of the center point of the 3D detection frame of the target object in the camera coordinate system corresponding to the monocular image, and the size information of the 3D detection frame. This method of predicting the depth of a target object in a monocular image has the problem of low prediction accuracy.
  • the present disclosure provides a depth detection method, device, equipment, storage medium, computer program and product, by establishing the two-dimensional position of the target object in the image coordinate system and the three-dimensional position in the corresponding camera coordinate system
  • the projection relationship information between positions, and using the projection relationship information as the feature information of the depth of the target object in the target space, can improve the confidence of the predicted depth information of the target object in the camera coordinate system.
  • the execution subject of the depth detection method provided in the embodiment of the present disclosure is generally a computer device with a certain computing power.
  • the computer The equipment includes, for example: terminal equipment or server or other processing equipment, and the terminal equipment can be user equipment (User Equipment, UE), mobile equipment, user terminal, terminal, cellular phone, cordless phone, personal digital assistant (Personal Digital Assistant, PDA) , handheld devices, computing devices, vehicle-mounted devices, wearable devices, etc.
  • the depth detection method may be implemented by a processor invoking computer-readable instructions stored in a memory.
  • the depth detection method provided by the embodiments of the present disclosure will be described below.
  • the embodiments of the present disclosure can be used, for example, to perform target detection on a monocular two-dimensional image, to obtain two-dimensional position information of the target object in the two-dimensional image, and three-dimensional position information of the target object in the camera coordinate system corresponding to the two-dimensional image.
  • FIG. 1 is a flowchart of a depth detection method provided by an embodiment of the present disclosure
  • the method includes steps S101 to S104, wherein:
  • S102 Based on the image to be processed, determine the two-dimensional position information of the two-dimensional detection frame of the target object in the image coordinate system corresponding to the image to be processed, and the camera coordinates of the target object corresponding to the image to be processed The projection position information of the three-dimensional detection frame in the image coordinate system in the system;
  • S104 Based on the intermediate depth value and the image to be processed, obtain a target depth value of a center point of the target object in the camera coordinate system.
  • the two-dimensional position information of the two-dimensional detection frame of the target object in the image coordinate system corresponding to the image to be processed and the camera corresponding to the target object in the image to be processed are determined based on the image to be processed
  • the projection position information of the three-dimensional detection frame in the image coordinate system in the coordinate system, and then based on the two-dimensional position information, the projection position information, and the projection relationship information between the two-dimensional detection frame and the three-dimensional detection frame, the target object is obtained.
  • the intermediate depth value in the camera coordinate system and based on the intermediate depth value and the image to be processed, the target depth value of the target object in the camera coordinate system is obtained, so that the projection relationship information between the two-dimensional detection frame and the three-dimensional detection frame is used as Feature information, using the projection relationship information to predict the depth of the target object, and improve the accuracy of the final target depth value of the target object in the camera coordinate system.
  • the images to be processed contain different target objects in different application scenarios.
  • the target object when the depth detection method provided by the embodiments of the present disclosure is applied to an automatic driving scene, the target object includes, for example, vehicles, pedestrians, obstacles in the road, etc.; when the depth detection method is applied to the field of object recognition, the target object For example, it includes the object to be recognized; when the depth detection method is applied to the field of camera positioning, the target object includes, for example, various objects in the target scene.
  • the detailed process of the depth detection method is illustrated by taking the application of the depth detection method in an automatic driving scene as an example.
  • the image coordinate system corresponding to the image to be processed is, for example, a two-dimensional coordinate system established with the origin of the pixel in the upper left corner of the image to be processed.
  • the position of each pixel on the image to be processed in the image to be processed can be represented by the coordinate value in the image coordinate system.
  • an embodiment of the present disclosure provides a method for determining the two-dimensional position information of the two-dimensional detection frame of the target object in the image coordinate system corresponding to the image to be processed, including:
  • S201 Perform feature extraction on the image to be processed, and acquire a feature map of the image to be processed.
  • the backbone neural network can be used to extract features of the image to be processed to obtain a feature map; wherein, the process of feature extraction of the image to be processed is the process of downsampling the image to be processed, that is, it is treated according to a certain downsampling rate
  • the processed image is down-sampled to obtain the feature map of the image to be processed; when the downsampling rate is R, the ratio of the size of the image to be processed to the size of the feature map is R.
  • At least one level of convolution processing may be performed on the image to be processed to obtain a feature map of the image to be processed.
  • at least one convolution kernel can be used to convolve the output result of the previous level of convolution processing or the image to be processed to obtain the result corresponding to the current level of convolution processing, and the last level The result of convolution processing is used as the feature map of the image to be processed.
  • each feature point has a position mapping relationship with a pixel in the image to be processed.
  • the downsampled two-dimensional detection frame is a detection frame formed by narrowing the two-dimensional detection frame of the target object after downsampling the image to be processed.
  • the first aspect when the probability that each feature point in the feature map belongs to the center point of the target object is obtained based on the feature map, for example, a pre-trained center point prediction neural network can be used to perform center point prediction processing on the feature map , to obtain the probability that each feature point in the feature map belongs to the center point of the target object.
  • a pre-trained center point prediction neural network can be used to perform center point prediction processing on the feature map , to obtain the probability that each feature point in the feature map belongs to the center point of the target object.
  • the center point prediction neural network can be, for example, a branch of the extension of the backbone neural network; that is, the center point prediction neural network and the backbone neural network belong to the same neural network; after the backbone neural network performs feature extraction on the image to be processed, the image to be processed The corresponding feature map is transmitted to the center point prediction neural network; the center point prediction neural network predicts the probability that each feature point in the feature map belongs to the center point of the target object based on the feature map.
  • the center point prediction neural network can be obtained by training, for example, in the following manner: obtain a sample image, and label position information of the center point of the sample object in the sample image; wherein, the center point of the sample object is the sample object The projection point of the center point of the three-dimensional detection frame in the camera coordinate system corresponding to the sample image in the sample image; using the sample image and the position label information, the backbone neural network to be trained and the backbone neural network to be trained The trained central point prediction neural network is trained to obtain the trained central point prediction neural network.
  • the center point prediction neural network is a branch extended from the backbone neural network
  • the backbone neural network and the center point prediction neural network to be trained can be trained together.
  • the second aspect Assuming that in the feature map output by the neural network, the coordinate value of any feature point in the feature map is: (x1, y1), its physical meaning is: the position of the projected point of the object in the image in the image, The coordinates obtained after downsampling and rounding down. Then (x1, y1) is added to the first position offset, and the obtained coordinate value is the coordinate obtained after downsampling of the center of the two-dimensional detection frame.
  • the pre-trained first position offset can be used to predict the first position offset corresponding to each feature point by the neural network Make predictions.
  • the first position offset prediction neural network may also be, for example, a branch extended from the backbone neural network, which is a different branch extended from the backbone neural network from the central point prediction neural network in the first aspect above.
  • the position p1 of a certain pixel in the image to be processed and the position p2 of a certain feature point in the feature map satisfy the following formula (1):
  • floor( ⁇ ) indicates the lower integer
  • R indicates the lower sampling rate.
  • the feature points in the feature map may not be matched with the pixels in the image to be processed at the pixel level, but with the pixels in the image to be processed at the sub-pixel level.
  • mod( ⁇ ) means to take the remainder.
  • the first pixel (or sub-pixel) corresponding to each feature point in the image to be processed can be obtained based on the above formula (2).
  • the sample image and the labeled image corresponding to the sample image have been obtained.
  • the center point of the two-dimensional detection frame in the image to be processed after downsampling the image to be processed can be obtained , and the first offset between the corresponding feature points; the obtained first offset is used as the first offset label information of the sample image, and the first offset to be trained is used to predict the neural network using the sample image Perform training to obtain the trained first offset prediction neural network.
  • the first offset prediction neural network is a branch extended by the backbone neural network
  • the above-mentioned sample image and the corresponding first offset annotation information can be used to determine the backbone neural network to be trained and the backbone neural network to be trained.
  • the first offset prediction neural network is trained to obtain the trained first offset prediction neural network.
  • the pre-trained two-dimensional detection frame when determining the downsampling size information of the downsampled two-dimensional detection frame with each feature point in the feature map as the center point based on the feature map, for example, the pre-trained two-dimensional detection frame can be used to predict the neural network to treat The image is processed to predict the detection frame, and the downsampled size information of the downsampled two-dimensional detection frame corresponding to each feature point in the feature map is obtained.
  • the two-dimensional detection frame prediction neural network can also be used as a branch of the backbone neural network extension, for example.
  • the downsampling detection frame can be regarded as the detection frame formed by using the downsampling rate to limit the two-dimensional detection frame of the target object in the image to be processed, therefore, the two-dimensional detection frame of the target object in the image to be processed.
  • the size s1 of the two-dimensional detection frame and the size s2 of the downsampled two-dimensional detection frame of the target object in the feature map satisfy the following formula (3):
  • the two-dimensional detection frame in the image to be processed can be obtained based on the above formula (3). size information for .
  • the sample image and the two-dimensional detection frame annotation information is based on the three-dimensional detection frame of the sample object in the camera coordinate system corresponding to the sample image in the sample Projection generation in the image; using the sample image and the two-dimensional detection frame labeling information corresponding to the sample image, the backbone neural network to be trained and the two-dimensional detection frame prediction neural network to be trained are trained, and the trained 2D detection box prediction neural network.
  • the projection relationship between the two-dimensional detection frame and the three-dimensional detection frame is used as feature data, so that the target depth value of the finally determined target object in the camera coordinate system corresponding to the image to be processed can have Higher confidence, but there is a certain difference between the real two-dimensional detection frame marked in the image and the two-dimensional detection frame formed based on the projection of the three-dimensional detection frame.
  • the 3D detection frame generates the projection relationship between the two, there will be some errors in the projection relationship. Therefore, in the embodiment of the present disclosure, the projection of the three-dimensional detection frame of the sample object in the camera coordinate system corresponding to the sample image in the sample image generates two-dimensional detection frame annotation information to eliminate this difference.
  • the central point prediction neural network in the first aspect, the first position offset prediction neural network in the second aspect, and the two-dimensional detection frame prediction neural network in the third aspect can all be A branch of the backbone neural network. Therefore, the same batch of sample images can be used to simultaneously train the above-mentioned backbone neural network, central point prediction neural network, first position offset prediction neural network, and two-dimensional detection frame prediction neural network. In addition, different sample images may be used to train the above three different branches respectively.
  • S203 Obtain the two-dimensional location information based on the probability, the first location offset, and the downsampling size information.
  • the two-dimensional position information of the target object in the image coordinate system includes: the first coordinate information (2D Center) of the center point of the two-dimensional detection frame in the image coordinate system, and the two-dimensional detection frame The size information (2D Size) of the box.
  • the two-dimensional position information is obtained based on the probability, the first position offset and the downsampling size information, for example, the following manner may be adopted:
  • the first position offset of the target feature point Based on the position information of the target feature point in the feature map, the first position offset of the target feature point, and the downsampling rate, determine that the center point of the two-dimensional detection frame is in the image coordinate system
  • the first coordinate information in
  • the probability corresponding to each feature point can be compared with the preset probability threshold. Comparison; when the probability corresponding to a feature point is greater than a preset probability threshold, the feature point is used as the target feature point.
  • D x offset represents a first position offset between the feature point and the corresponding first pixel point in the X-axis direction of the image coordinate system.
  • D y offset represents a first position offset between the feature point and the corresponding first pixel point in the Y-axis direction of the image coordinate system.
  • the feature point is the target feature point, that is, the pixel point corresponding to the feature point (x, y) is the first central point of the two-dimensional detection frame of the target object.
  • the above formula (5) can be used to obtain the center point of the two-dimensional detection frame of the target object at The first coordinate information in the image coordinate system.
  • the downsampling size information corresponding to the target feature point has been obtained based on the prediction in S202 above
  • the size information of the two-dimensional detection frame of the target object in the image to be processed can be obtained based on the above formula (3).
  • the camera coordinate system corresponding to the image to be processed is the z-axis
  • the plane where the optical center of the camera is located and is perpendicular to the optical axis of the camera is the X-axis and the Y-axis.
  • the Z-axis direction is referred to as the depth direction.
  • the embodiment of the present disclosure also provides a projection position information of the target object in the camera coordinate system corresponding to the image to be processed based on the image to be processed, and the three-dimensional detection frame in the image coordinate system methods, including:
  • the coordinate value of any feature point in the feature map is: (x1, y1), and its physical meaning is: the position of the projected point of the object in the image in the image, after the following Sampling, and the coordinates obtained after rounding down.
  • (x1, y1) is added to the second position offset to obtain the coordinate value, which is the coordinate obtained after the three-dimensional center of the object is projected on the image to form a projection point, and the projection point is down-sampled.
  • the second position offset corresponding to each feature point is used to represent each feature point and the position offset formed by the second pixel corresponding to each feature point after downsampling;
  • the second pixel is the A pixel point corresponding to a projection point of the central point of the three-dimensional detection frame in the image to be processed is in the image to be processed.
  • the manner of acquiring the feature map of the image to be processed is the same as the manner of acquiring the feature map in S201 above.
  • the pre-trained second position offset prediction neural network can be used to perform The second position offset prediction process obtains the second position offset corresponding to each feature point in the feature map.
  • the second position offset prediction neural network may also be, for example, an extended branch network of the backbone neural network. Input the image to be processed into the backbone neural network; the backbone neural network downsamples the image to be processed to obtain the feature map of the image to be processed; after the feature map enters the second position offset prediction neural network, each feature point in the feature map is obtained corresponding to the second position offset.
  • the sample image and the corresponding labeled image of the sample image have been obtained.
  • the two-dimensional annotation frame and the three-dimensional annotation frame can be annotated on the sample image, and then based on the annotated three-dimensional annotation frame, the coordinates of the projection point of the center point of the annotated three-dimensional annotation frame in the image to be processed are obtained Value s1; the coordinate value of the center point of the marked two-dimensional label box in the image to be processed is s1'.
  • the coordinate value s1 of the projection point of the center point of the marked three-dimensional annotation frame in the image to be processed and the coordinate value s2 obtained by using the formula (1) are substituted into the above formula (2), and the sample object is obtained
  • the center of the feature point in the feature map corresponding to the sample object is offset from the center point of the corresponding sample object center in the sample image after downsampling.
  • the center point of the projection of the three-dimensional detection frame of the sample object in the image to be processed after downsampling the image to be processed can be obtained.
  • point, and the first offset between the corresponding feature points; the obtained first offset is used as the first offset label information of the sample image, and the first offset to be trained is used to predict the neural network of the sample image
  • the network is trained to obtain the trained first offset prediction neural network.
  • S302 Based on the probability that each feature point in the feature map belongs to the center point of the target object, the second position offset, and the downsampling rate, obtain the 3D detection frame in the image coordinate system Projection location information.
  • the projected position information includes at least one of the following: second coordinate information of a projected point of the center point of the three-dimensional detection frame in the image coordinate system.
  • the projection position information of the three-dimensional detection frame in the image coordinate system can be obtained in the following manner:
  • the second position offset corresponding to the target feature point determines that the center point of the three-dimensional detection frame is in the image
  • the second coordinate information of the projected point in the coordinate system Based on the position information of the target feature point in the feature map, the second position offset corresponding to the target feature point, and the downsampling rate, determine that the center point of the three-dimensional detection frame is in the image
  • the second coordinate information of the projected point in the coordinate system Based on the position information of the target feature point in the feature map, the second position offset corresponding to the target feature point, and the downsampling rate, determine that the center point of the three-dimensional detection frame is in the image The second coordinate information of the projected point in the coordinate system.
  • the manner of determining the target feature point is similar to the manner of determining the target feature point in S203 above.
  • the target feature point for example, the position information of the target feature point in the feature map, the second position offset corresponding to the target feature point, and the downsampling rate can be substituted into the above formula (5),
  • the second coordinate information of the projection point of the central point of the three-dimensional detection frame in the image coordinate system is obtained.
  • the projection position information Based on the two-dimensional position information, the projection position information, the actual size information of the target object, the orientation information of the target object, and the projection relationship information between the two-dimensional detection frame and the three-dimensional detection frame , to obtain the intermediate depth value of the target object in the camera coordinate system.
  • the depth detection method provided by the embodiment of the present disclosure further includes:
  • a pre-trained size prediction neural network may be used to perform size prediction processing on the feature map of the image to be processed to obtain actual size information of the target object.
  • the actual size information of the target object is, for example, size information of a three-dimensional bounding box of the target object in the camera coordinate system corresponding to the image to be processed.
  • the pre-trained orientation prediction neural network can also be used to perform orientation prediction processing on the feature map of the image to be processed to obtain the orientation information of the target object in the camera coordinate system.
  • the size prediction neural network, and towards the prediction neural network can apply different branches of the network extension for the backbone. It can be synchronized with the central point prediction neural network, the first position offset prediction neural network, the two-dimensional detection frame prediction neural network, the second position offset prediction neural network, and the backbone neural network described in the above embodiments train.
  • it further includes: establishing projection relationship information between the two-dimensional detection frame and the three-dimensional detection frame.
  • the projection relationship information of the two-dimensional detection frame and the three-dimensional detection frame is based on the size information and position information of the projection of the three-dimensional detection frame in the image coordinate system, and the size information and position information of the two-dimensional detection frame information created.
  • the projection relationship information between the two-dimensional detection frame and the three-dimensional detection frame can be established in the following manner:
  • the 3D bounding box of any target object is expressed as a seven-tuple: (W, H, L, x, y , z, ry ); where W, H, L represent the 3D bounding box respectively The length, width, and height of the frame; (x, y, z) represents the coordinates of the center point of the three-dimensional bounding box; r y represents the angle of rotation of the target object around Y in the camera coordinate system, and the range is [- ⁇ , ⁇ ] .
  • the two-dimensional bounding box of any target object in the corresponding image coordinate system is represented as a quadruple: (w, h, u, v); where, w, h represent the width and height of the two-dimensional bounding box, (u ,v) represents the coordinate value of the center point of the two-dimensional bounding box in the image coordinate system.
  • ⁇ obj represents the center point of the three-dimensional bounding box as the coordinate value wrapped in the camera coordinate system; Indicates the coordinate value of the corner point of the 3D bounding box in the camera coordinate system.
  • the corner point can be projected from the camera coordinate system to the image coordinate system, and the coordinates of the projection point of the corner point in the image coordinate system Satisfy the following formula (9):
  • z c represents the depth value of the c-th corner point in the camera coordinate system
  • u c , v c subtable represents the x-axis coordinate of the projection point of the c-th corner point in the image coordinate system in the image coordinate system value
  • the coordinate value of the y-axis is the coordinate value of the y-axis.
  • the distance between the uppermost corner max c ⁇ v c ⁇ and the lowermost corner min c ⁇ v c ⁇ can be based on the image coordinate system
  • the vertical distance is estimated to obtain the projection height h of the two-dimensional bounding box, which satisfies the following formula (10):
  • v c is derived from the above formula (9), Indicates the maximum depth difference between each corner point and the center point in the three-dimensional bounding box; z indicates the depth value of the center point; ⁇ y max indicates the coordinate difference between each corner point and the center point in the three-dimensional bounding box on the Y axis of the camera coordinate system The maximum value; ⁇ y min represents the minimum value of the coordinate difference between each corner point and the center point in the three-dimensional bounding box on the Y circle; f v represents the focal length of the camera.
  • (u o , v o ) represents the coordinate value of the projection point of the center point of the three-dimensional bounding box in the image coordinate system in the image coordinate system.
  • c v represents the principal point offset of the camera.
  • the depth value of the center point of the three-dimensional bounding box can be determined.
  • the above formula (12) is the projection relationship information between the two-dimensional bounding box and the three-dimensional bounding box in the embodiment of the present disclosure.
  • f v represents the focal length of the camera, which can be read based on the attribute information of the image to be processed Obtained
  • h represents the height of the two-dimensional detection frame of the target object in the image coordinate system, which can be obtained based on the above two-dimensional position information, that is, based on the above two-dimensional detection frame size information.
  • ⁇ z max represents the maximum value of the depth difference between the 8 corner points of the 3D detection frame of the target object and the depth of the center point of the 3D detection frame.
  • the depth difference ⁇ zc between the c -th corner point of the 8 corner points of the three-dimensional detection frame and the center point of the three-dimensional detection frame satisfies the following formula (15):
  • L and W are derived from the actual size information of the target object, and respectively represent the height and width of the target object.
  • r y is the orientation information of the target object.
  • the eight corner points of the three-dimensional detection frame of the target object and the center of the three-dimensional detection frame are calculated respectively.
  • the depth difference of the point and then take the maximum value of the depth difference between the eight corner points and the center point of the three-dimensional detection frame, that is, ⁇ z max .
  • a target depth value of the center point of the target object in the camera coordinate system is obtained.
  • the depth image composed of the center point of the target object in the intermediate depth value in the camera coordinate system is nonlinearly transformed to obtain a depth feature map, the purpose of which is to remove the noise of the depth feature map, and then to be able to
  • the depth feature map is used as a part of the feature of the image to be processed, and the depth feature map and the feature map of the image to be processed are superimposed to form the target feature map corresponding to the image to be processed, and then the pre-trained depth value is used to predict the neural network.
  • the target feature map is subjected to depth prediction processing to obtain the target depth value of each feature point in the feature map;
  • a nonlinear transformation module can be used to transform the center point of the target object in the camera coordinate system
  • the intermediate depth values in are nonlinearly transformed to obtain a depth feature map.
  • the projection relationship information between the two-dimensional detection frame and the three-dimensional detection frame is used to generate a depth feature map that can limit the depth prediction, and then use the depth feature map as the feature data of the depth prediction, and combine it with the feature of the image to be processed
  • the target feature map of the image to be processed is obtained, and then the depth prediction process is performed on the target feature map by using the depth prediction neural network, and the depth value of the center point of the target object obtained has higher confidence and accuracy.
  • it further includes: based on the target depth value of the center point of the target object in the camera coordinate system and the actual size information of the target object, obtaining the 3D inspection results in the coordinate system.
  • the automatic driving process of the automatic driving vehicle can be controlled based on the three-dimensional detection results.
  • the embodiment of the present disclosure provides an example of using the target neural network to process the image to be processed to obtain the depth value of the target object in the camera coordinate system corresponding to the image to be processed.
  • the target neural network includes: a backbone neural network 401, a center point prediction neural network 402 respectively connected to the backbone neural network, a first position offset prediction neural network 403, a two-dimensional detection frame prediction neural network 404, a second position offset Quantity prediction neural network 405, size prediction neural network 406, and orientation prediction neural network 407.
  • Input the image to be processed to the backbone neural network 401 to obtain a feature map.
  • the feature map is input to the center point prediction neural network 402 to obtain a heat map (Heatmap), wherein the pixel value of each pixel in the heat map represents that the feature point in the feature map corresponding to the pixel point belongs to the center point of the target object The probability.
  • Heatmap a heat map
  • the feature map is input to the 2D detection frame prediction neural network 404 to obtain the downsampled size information of the downsampled 2D detection frame with each feature point as the center point, that is, the size information of the 2D detection frame.
  • the feature map is input to the size prediction neural network 406 to obtain the actual size information (3D dimension) of the target object in the image coordinate system.
  • the feature map is input to the orientation prediction neural network 407 to obtain orientation information (Orientation) of the target object.
  • the target neural network further includes: a first processing module 408 connected to the central point prediction neural network 402 , the first position offset prediction neural network 403 , and the two-dimensional detection frame prediction neural network 404 .
  • the heat map, the first position offset, and the two-dimensional detection frame size information enter the first processing module 408, and the first processing module 408 uses the heat map, the first position offset, and the two-dimensional detection frame size information to generate Two-dimensional position information of the two-dimensional detection frame of the target object in the image coordinate system corresponding to the image to be processed.
  • the target neural network further includes: a second processing module 409 connected to the central point prediction neural network 402 and the second position offset prediction neural network 405 .
  • the heat map and the second position offset enter the second processing module 409, and the second processing module uses the heat map and the second position offset to generate a three-dimensional detection of the target object in the camera coordinate system corresponding to the image to be processed
  • the projected position information of the frame in the image coordinate system projects the position information.
  • the target neural network further includes: a third processing module 410 connected to the first processing module 408 , the second processing module 409 , the size prediction neural network 406 , and the orientation prediction neural network 407 .
  • Two-dimensional position information, projected position information, actual size information, and orientation information are input to the third processing module 410, and the third processing module 410 is based on the projection relationship information between the two-dimensional detection frame and the three-dimensional detection frame (that is, the above formula ( 12), (13), and (14)), using two-dimensional position information, projection position information, actual size information, and orientation information to obtain the intermediate depth value of the center point of the target object in the camera coordinate system. Depth map.
  • the target neural network further includes: a nonlinear transformation module 411 connected to the third processing module 410 .
  • the depth map enters the nonlinear transformation module 411, and the nonlinear transformation module 411 performs nonlinear transformation on the depth map to obtain a depth feature map (Geometric map).
  • the target neural network further includes: a fourth processing module 412 connected to the backbone network 401 and the nonlinear transformation module 411 .
  • the depth feature map and the feature map are input to the fourth processing module 412, and the fourth processing module 412 performs superposition processing on the depth feature map and the feature map to obtain a target feature map of the image to be processed.
  • the target neural network further includes: a deep prediction neural network 413 connected to the fourth processing module 412 .
  • the target feature map is input to the depth prediction neural network 413, and the depth prediction neural network 413 performs depth prediction processing on the target feature map to obtain the target depth value of the center point of the target object in the camera coordinate system.
  • the target depth value of the center point of the image to be processed in the camera coordinate system can be obtained.
  • the embodiment of the present disclosure also provides a depth detection device corresponding to the depth detection method. Since the problem-solving principle of the device in the embodiment of the present disclosure is similar to the above-mentioned depth detection method in the embodiment of the present disclosure, the implementation of the device See the implementation of the method.
  • FIG. 5 it is a schematic diagram of a depth detection device provided by an embodiment of the present disclosure, and the device includes:
  • An acquisition module 51 configured to acquire an image to be processed
  • the first processing module 52 is configured to determine, based on the image to be processed, the two-dimensional position information of the two-dimensional detection frame of the target object in the image coordinate system corresponding to the image to be processed, and the position information of the target object in the image to be processed processing the projection position information of the three-dimensional detection frame in the image coordinate system corresponding to the image;
  • the second processing module 53 is configured to obtain the center point of the target object based on the two-dimensional position information, the projected position information, and the projection relationship information between the two-dimensional detection frame and the three-dimensional detection frame an intermediate depth value in the camera coordinate system;
  • the prediction module 54 is configured to obtain a target depth value of the center point of the target object in the camera coordinate system based on the intermediate depth value and the image to be processed.
  • the first processing module 52 determines, based on the image to be processed, the two-dimensional position information of the two-dimensional detection frame of the target object in the image coordinate system corresponding to the image to be processed , configured as:
  • the probability that each feature point in the feature map belongs to the center point of the target object, the first position offset corresponding to each feature point, and the downsampling with each feature point as the center point are obtained.
  • the downsampled two-dimensional detection frame is a detection frame formed by narrowing the two-dimensional detection frame of the target object after downsampling the image to be processed.
  • the two-dimensional position information includes: first coordinate information of a center point of the two-dimensional detection frame in the image coordinate system, and size information of the two-dimensional detection frame.
  • the first processing module 52 obtains the two-dimensional position information based on the probability, the first position offset and the downsampling size information, it is configured to:
  • the first position offset of the target feature point Based on the position information of the target feature point in the feature map, the first position offset of the target feature point, and the downsampling rate, determine that the center point of the two-dimensional detection frame is in the image coordinate system
  • the first coordinate information in
  • the first processing module 52 when performing feature extraction on the image to be processed and acquiring a feature map of the image to be processed, is configured to:
  • the first processing module 52 when obtaining the probability that each feature point in the feature map belongs to the center point of the target object based on the feature map, is configured to:
  • the pre-trained center point prediction neural network is used to perform center point prediction processing on the feature map, and the probability that each feature point in the feature map belongs to the center point of the target object is obtained.
  • a training module 55 is also included, configured to train the central point prediction neural network in the following manner:
  • the center point of the sample object is the center of the three-dimensional detection frame of the sample object in the camera coordinate system corresponding to the sample image a projected point of a point in said sample image;
  • the first processing module 52 based on the image to be processed, the three-dimensional detection frame of the target object in the camera coordinate system corresponding to the image to be processed is in the image coordinate system
  • the configuration is:
  • the projection position information of the first processing module 52 includes at least one of the following: second coordinate information of a projection point of the center point of the three-dimensional detection frame in the image coordinate system.
  • the first processing module 52 based on the probability that each feature point in the feature map belongs to the center point of the target object, the second position offset, and the downsampling rate , when obtaining the projection position information of the 3D detection frame in the image coordinate system, the configuration is:
  • the second position offset corresponding to the target feature point determines that the center point of the three-dimensional detection frame is in the image
  • the second coordinate information of the projected point in the coordinate system Based on the position information of the target feature point in the feature map, the second position offset corresponding to the target feature point, and the downsampling rate, determine that the center point of the three-dimensional detection frame is in the image
  • the second coordinate information of the projected point in the coordinate system Based on the position information of the target feature point in the feature map, the second position offset corresponding to the target feature point, and the downsampling rate, determine that the center point of the three-dimensional detection frame is in the image The second coordinate information of the projected point in the coordinate system.
  • the second processing module 53 based on the two-dimensional position information, the projection position information, and the projection relationship information between the two-dimensional detection frame and the three-dimensional detection frame , when obtaining the intermediate depth value of the target object in the camera coordinate system, the configuration is:
  • the projection position information Based on the two-dimensional position information, the projection position information, the actual size information of the target object, the orientation information of the target object, and the projection relationship information between the two-dimensional detection frame and the three-dimensional detection frame , to obtain the intermediate depth value of the target object in the camera coordinate system.
  • the first processing module 52 is further configured to:
  • the projection relationship information of the two-dimensional detection frame and the three-dimensional detection frame is based on the size information and position information of the projection of the three-dimensional detection frame in the image coordinate system, and the two-dimensional detection frame.
  • the size information and position information of the frame are established.
  • the prediction module 54 when obtaining the target depth value of the center point of the target object in the camera coordinate system based on the intermediate depth value, is configured to:
  • a target depth value of the center point of the target object in the camera coordinate system is obtained.
  • the prediction module 54 obtains the target depth of the center point of the target object in the camera coordinate system based on the depth feature map and the feature map of the image to be processed value, the configuration is:
  • it further includes a third processing module 56 configured to obtain the target depth value of the center point of the target object in the camera coordinate system and the actual size information of the target object. The three-dimensional detection result of the target object in the camera coordinate system.
  • FIG. 6 is a schematic structural diagram of the computer device provided by the embodiment of the present disclosure, including:
  • Processor 61 and memory 62 stores machine-readable instructions executable by the processor 61, the processor 61 is used to execute the machine-readable instructions stored in the memory 62, and the machine-readable instructions are executed by the processor 61 During execution, the processor 61 performs the following steps:
  • the projection position information Based on the two-dimensional position information, the projection position information, and the projection relationship information between the two-dimensional detection frame and the three-dimensional detection frame, obtain the center point of the target object in the camera coordinate system intermediate depth value;
  • a target depth value of the center point of the target object in the camera coordinate system is obtained.
  • memory 62 comprises memory 621 and external memory 622;
  • Memory 621 here is also called internal memory, is used for temporarily storing the operation data in processor 61, and the data exchanged with external memory 622 such as hard disk, processor 61 communicates with memory 621 through memory 621.
  • the external memory 622 performs data exchange.
  • Embodiments of the present disclosure also provide a computer-readable storage medium, on which a computer program is stored. When the computer program is run by a processor, the steps of the depth detection method described in the above-mentioned method embodiments are executed.
  • the storage medium may be a volatile or non-volatile computer-readable storage medium.
  • the embodiment of the present disclosure also provides a computer program product, the computer program product carries a program code, and the instructions included in the program code can be used to execute the steps of the depth detection method described in the above method embodiment, for details, please refer to the above method The embodiment will not be repeated here.
  • the above-mentioned computer program product may be specifically implemented by means of hardware, software or a combination thereof.
  • the computer program product is embodied as a computer storage medium, and in another optional embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK) etc. Wait.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present disclosure may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
  • the functions are realized in the form of software function units and sold or used as independent products, they can be stored in a non-volatile computer-readable storage medium executable by a processor.
  • the technical solution of the present disclosure is essentially or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in various embodiments of the present disclosure.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disc and other media that can store program codes. .
  • the image to be processed is acquired; based on the image to be processed, the two-dimensional position information of the two-dimensional detection frame of the target object in the image coordinate system corresponding to the image to be processed, and the position information of the target object at The projection position information of the three-dimensional detection frame in the image coordinate system corresponding to the image to be processed; based on the two-dimensional position information, the projection position information, and the two-dimensional detection frame and the The projection relationship information between the three-dimensional detection frames is used to obtain the intermediate depth value of the center point of the target object in the camera coordinate system; based on the intermediate depth value, the center point of the target object is obtained in the camera coordinate system Target depth value in coordinate system.
  • the embodiments of the present disclosure can improve the accuracy of the predicted depth information of the target object in the camera coordinate system.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Quality & Reliability (AREA)
  • Image Analysis (AREA)

Abstract

一种深度检测方法、装置、设备、存储介质、计算机程序及产品,其中,该方法包括:获取待处理图像(S101);基于所述待处理图像,确定目标对象的二维检测框在所述待处理图像对应的图像坐标系中的二维位置信息、以及所述目标对象在所述待处理图像对应的相机坐标系中的三维检测框在所述图像坐标系中的投影位置信息(S102);基于所述二维位置信息、所述投影位置信息、以及所述二维检测框和所述三维检测框之间的投影关系信息,得到所述目标对象的中心点在所述相机坐标系中的中间深度值(S103);基于所述中间深度值,得到所述目标对象的中心点在所述相机坐标系中的目标深度值(S104)。

Description

深度检测方法、装置、设备、存储介质、计算机程序及产品
相关申请的交叉引用
本公开基于申请号为202110713298.0、申请日为2021年06月25日、申请名称为“深度检测方法、装置、计算机设备及存储介质”的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此以全文引入的方式引入本公开。
技术领域
本公开涉及图像处理技术领域,尤其涉及一种深度检测方法、装置、设备、存储介质、计算机程序及产品。
背景技术
三维目标检测是计算机视觉领域的一个重要而具有挑战性的问题,在自动驾驶、机器人技术、增强或虚拟现实等计算机视觉应用中发挥着重要作用。单目三维目标检测能够利用单目摄像机获取的单目图像,实现对弹幕图像中的目标对象进行三维检测的目的。
在对单目图像进行三维目标检测时,需要得到目标对象的中心点在单目图像对应的相机坐标系中的深度值;当前确定目标对象中心点在单目图像对应的相机坐标系中的深度值时,存在深度值精度置信度较差的问题。
发明内容
本公开实施例至少提供一种深度检测方法、装置、设备、存储介质、计算机程序及产品。
第一方面,本公开实施例提供了一种深度检测方法,包括:获取待处理图像;基于所述待处理图像,确定目标对象的二维检测框在所述待处理图像对应的图像坐标系中的二维位置信息、以及所述目标对象在所述待处理图像对应的相机坐标系中的三维检测框在所述图像坐标系中的投影位置信息;基于所述二维位置信息、所述投影位置信息、以及所述二维检测框和所述三维检测框之间的投影关系信息,得到所述目标对象的中心点在所述相机坐标系中的中间深度值;基于所述中间深度值和所述待处理图像,得到所述目标对象的中心点在所述相机坐标系中的目标深度值。
这样,通过在获取待处理图像后,基于待处理图像,确定目标对象的二维检测框在待处理图像对应的图像坐标系中的二维位置信息、以及目标对象在待处理图像对应的相机坐标系中的三维检测框在所述图像坐标系中的投影位置信息,然后基于二维位置信息、投影位置信息、以及二维检测框和三维检测框之间的投影关系信息,得到目标对象在相机坐标系中国的中间深度值,并基于该中间深度值,得到目标对象在相机坐标系中的目标深度值,从而将二维检测框和三维检测框之间的投影关系信息作为约束,提升最终所得到的目标对象在相机坐标系中的目标深度值的置信度。
一种可能的实施方式中,所述基于所述待处理图像,确定目标对象的二维检测框在所述待处理图像对应的图像坐标系中的二维位置信息,包括:对所述待处理图像进行特征提取,获取待处理图像的特征图;基于所述特征图,得到所述特征图中的每个特征点属于目标对象的中心点的概率、与各个特征点对应的第一位置偏移量、和以各个特征点为中心点的下采样二维检测框的下采样尺寸信息;基于所述概率、所述第一位置偏移量以及所述下采样尺寸信息,得到所述二维位置信息;其中,下采样二维检测框,为对待处理图像进行下采样后,所述目标对象二维检测框产生限缩形成的检测框。
一种可能的实施方式中,所述二维位置信息包括:所述二维检测框的中心点在所述图像坐标系中的第一坐标信息、以及所述二维检测框的尺寸信息。
一种可能的实施方式中,所述基于所述概率、所述第一位置偏移量以及所述下采样尺寸信息,得到所述二维位置信息,包括:基于所述特征图中的每个特征点属于目标对象的中心点的概率,从所述特征图中确定目标特征点;基于所述目标特征点在所述特征图中的位置信息、所述目标特征点的第一位置偏移量、以及下采样率,确定所述二维检测框的中心点在所述图像坐标系中的第一坐标信息;以 及,基于所述目标特征点对应的下采样尺寸信息、以及所述下采样率,确定所述二维检测框的尺寸信息。
一种可能的实施方式中,所述对所述待处理图像进行特征提取,获取待处理图像的特征图,包括:利用预先训练的骨干神经网络对所述待处理图像进行特征提取,得到所述待处理图像的特征图;所述基于所述特征图,得到所述特征图中的每个特征点属于目标对象的中心点的概率,包括:利用预先训练的中心点预测神经网络对特征图进行中心点预测处理,得到特征图中的各个特征点属于目标对象的中心点的概率。
一种可能的实施方式中,采用下述方式训练所述中心点预测神经网络:获取样本图像,以及样本对象的中心点在所述样本图像中的标注位置信息;其中,所述样本对象的中心点为样本对象在所述样本图像对应的相机坐标系中的三维检测框的中心点在所述样本图像中的投影点;利用所述样本图像、以及所述位置标注信息,对待训练的骨干神经网络、以及待训练的中心点预测神经网络进行训练,得到训练好的所述中心点预测神经网络。
一种可能的实施方式中,基于所述待处理图像,所述目标对象在所述待处理图像对应的相机坐标系中的三维检测框在所述图像坐标系中的投影位置信息,包括:基于所述待处理图像的特征图,得到与所述特征图中的每个特征点对应的第二位置偏移量;基于所述特征图中的每个特征点属于目标对象的中心点的概率、所述第二位置偏移量、以及下采样率,得到所述三维检测框在所述图像坐标系中的投影位置信息。
一种可能的实施方式中,所述投影位置信息包括下述至少一种:所述三维检测框的中心点在所述图像坐标系中投影点的第二坐标信息。
一种可能的实施方式中,所述基于所述特征图中的每个特征点属于目标对象的中心点的概率、所述第二位置偏移量、以及下采样率,得到所述三维检测框在所述图像坐标系中的投影位置信息,包括:基于所述特征图中的每个特征点属于目标对象的中心点的概率,从所述特征图中,确定目标特征点;基于所述目标特征点在所述特征图中的位置信息、所述目标特征点对应的第二位置偏移量、以及所述下采样率,确定所述三维检测框的中心点在所述图像坐标系中投影点的第二坐标信息。
一种可能的实施方式中,所述基于所述二维位置信息、所述投影位置信息、以及所述二维检测框和所述三维检测框之间的投影关系信息,得到所述目标对象在所述相机坐标系中的中间深度值,包括:基于所述二维位置信息、所述投影位置信息、所述目标对象的实际尺寸信息、所述目标对象的朝向信息、以及所述二维检测框和所述三维检测框之间的投影关系信息,得到所述目标对象在所述相机坐标系中的中间深度值。
一种可能的实施方式中,还包括:基于所述待处理图像的特征图,对所述目标对象进行尺寸预测处理,得到所述目标对象的实际尺寸信息;和/或,基于所述待处理图像的特征图,对所述目标对象进行朝向预测处理,得到所述目标对象在所述相机坐标系中的朝向信息。
一种可能的实施方式中,所述二维检测框和三维检测框的投影关系信息,是基于所述三维检测框在图像坐标系中的投影的尺寸信息和位置信息、与所述二维检测框的尺寸信息和位置信息建立的。
一种可能的实施方式中,所述基于所述中间深度值,得到所述目标对象的中心点在所述相机坐标系中的目标深度值,包括:对所述目标对象的中心点在所述相机坐标系中的中间深度值构成的深度图像进行非线性变换,得到深度特征图;基于所述深度特征图、以及所述待处理图像的特征图,得到所述目标对象的中心点在所述相机坐标系中的目标深度值。
一种可能的实施方式中,所述基于所述深度特征图、以及所述待处理图像的特征图,得到所述目标对象的中心点在所述相机坐标系中的目标深度值,包括:将所述深度特征图、以及所述待处理图像的特征图进行叠加,形成目标特征图;利用预先训练的深度值预测神经网络对所述目标特征图进行深度预测处理,得到所述特征图中各个特征点的目标深度值;基于所述特征图中各个特征点属于目标对象的中心点的概率、以及所述各个特征点分别对应的目标深度值,得到所述目标对象的中心点在所述相机坐标系中的目标深度值。
一种可能的实施方式中,还包括:基于所述目标对象的中心点在所述相机坐标系中的目标深度值、 以及所述目标对象的实际尺寸信息,得到所述目标对象在所述相机坐标系中的三维检测结果。
第二方面,本公开实施例还提供一种深度检测装置,包括:获取模块,配置为获取待处理图像;第一处理模块,配置为基于所述待处理图像,确定目标对象的二维检测框在所述待处理图像对应的图像坐标系中的二维位置信息、以及所述目标对象在所述待处理图像对应的相机坐标系中的三维检测框在所述图像坐标系中的投影位置信息;第二处理模块,配置为基于所述二维位置信息、所述投影位置信息、以及所述二维检测框和所述三维检测框之间的投影关系信息,得到所述目标对象的中心点在所述相机坐标系中的中间深度值;预测模块,配置为基于所述中间深度值,得到所述目标对象的中心点在所述相机坐标系中的目标深度值。
第三方面,本公开可选实现方式还提供一种计算机设备,包括处理器和存储器,所述存储器存储有所述处理器可执行的机器可读指令,所述处理器用于执行所述存储器中存储的机器可读指令,所述机器可读指令被所述处理器执行时实现上述第一方面,或第一方面中任一种可能的实施方式中的方法。
第四方面,本公开可选实现方式还提供一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,该计算机程序被运行时执行上述第一方面,或第一方面中任一种可能的实施方式中的步骤。
第五方面,本公开还提供一种计算机程序,包括计算机可读代码,当所述计算机可读代码在电子设备中运行时,所述电子设备中的处理器执行用于实现上述第一方面,或第一方面中任一种可能的实施方式中的步骤。
第六方面,本公开还提供一种计算机程序产品,所述计算机程序产品包括一条或多条指令,所述一条或多条指令适于由处理器加载并执行上述第一方面,或第一方面中任一种可能的实施方式中的步骤。
关于上述深度检测装置、计算机设备、及计算机可读存储介质的效果描述参见上述深度检测方法的说明。
为使本公开的上述目的、特征和优点能更明显易懂,根据下面参考附图对本公开实施例进行详细说明,本公开的其它特征及方面将变得清楚。应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,而非限制本公开。
附图说明
为了更清楚地说明本公开实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,此处的附图被并入说明书中并构成本说明书中的一部分,这些附图示出了符合本公开的实施例,并与说明书一起用于说明本公开的技术方案。应当理解,以下附图仅示出了本公开的某些实施例,因此不应被看作是对范围的限定,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他相关的附图。
图1示出了本公开实施例所提供的一种深度检测方法的流程图;
图2示出了本公开实施例所提供的确定目标对象的二维检测框在待处理图像对应的图像坐标系中的二维位置信息的方法的流程图;
图3示出了本公开实施例所提供的确定三维检测框在图像坐标系中的投影位置信息的方法的流程图;
图4示出了本公开实施例所提供的一种用于进行深度检测的目标神经网络的结构示意图;
图5示出了本公开实施例所提供的深度检测装置的结构示意图;
图6示出了本公开实施例所提供的一种计算机设备的示意图。
具体实施方式
为使本公开实施例的目的、技术方案和优点更加清楚,下面将结合本公开实施例中附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本公开一部分实施例,而不是全部的实施例。通常在此处描述和示出的本公开实施例的组件可以以各种不同的配置来布置和设计。因此,以下对本公开的实施例的详细描述并非旨在限制要求保护的本公开的范围,而是仅仅表 示本公开的选定实施例。基于本公开的实施例,本领域技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例,都属于本公开保护的范围。
经研究发现,在基于单目图像的三维深度检测方法中,通常利用样本单目图像、以及对单目图像中的目标对象进行三维标记生成的标注信息,对神经网络进行训练。得到的神经网络能够直接预测得到目标对象的三维检测框的中心点在单目图像对应的相机坐标系中的深度值、以及三维检测框的尺寸信息。这种对单目图像中的目标对象的深度进行预测的方法存在预测精度较低的问题。
基于上述研究,本公开提供了一种深度检测方法、装置、设备、存储介质、计算机程序及产品,通过建立目标对象在图像坐标系中的二维位置、和在对应的相机坐标系中的三维位置之间的投影关系信息,并将投影关系信息作为目标对象在目标空间中的深度的特征信息,能够提升预测得到的目标对象在相机坐标系中深度信息的置信度。
针对以上方案所存在的缺陷,均是发明人在经过实践并仔细研究后得出的结果,因此,上述问题的发现过程以及下文中本公开针对上述问题所提出的解决方案,都应该是发明人在本公开过程中对本公开做出的贡献。
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步定义和解释。
为便于对本实施例进行理解,首先对本公开实施例所公开的一种深度检测方法进行详细介绍,本公开实施例所提供的深度检测方法的执行主体一般为具有一定计算能力的计算机设备,该计算机设备例如包括:终端设备或服务器或其它处理设备,终端设备可以为用户设备(User Equipment,UE)、移动设备、用户终端、终端、蜂窝电话、无绳电话、个人数字助理(Personal Digital Assistant,PDA)、手持设备、计算设备、车载设备、可穿戴设备等。在一些可能的实现方式中,该深度检测方法可以通过处理器调用存储器中存储的计算机可读指令的方式来实现。
下面对本公开实施例提供的深度检测方法加以说明。本公开实施例例如可以用于对单目二维图像进行目标检测,得到目标对象在二维图像中的二维位置信息、以及目标对象在二维图像对应的相机坐标系中的三维位置信息。
参见图1所示,为本公开实施例提供的深度检测方法的流程图,所述方法包括步骤S101至S104,其中:
S101:获取待处理图像;
S102:基于所述待处理图像,确定目标对象的二维检测框在所述待处理图像对应的图像坐标系中的二维位置信息、以及所述目标对象在所述待处理图像对应的相机坐标系中的三维检测框在所述图像坐标系中的投影位置信息;
S103:基于所述二维位置信息、所述投影位置信息、以及所述二维检测框和所述三维检测框之间的投影关系信息,得到所述目标对象的中心点在所述相机坐标系中的中间深度值;
S104:基于所述中间深度值和所述待处理图像,得到所述目标对象的中心点在所述相机坐标系中的目标深度值。
本公开实施例在获取待处理图像后,基于待处理图像,确定目标对象的二维检测框在待处理图像对应的图像坐标系中的二维位置信息、以及目标对象在待处理图像对应的相机坐标系中的三维检测框在所述图像坐标系中的投影位置信息,然后基于二维位置信息、投影位置信息、以及二维检测框和三维检测框之间的投影关系信息,得到目标对象在相机坐标系中的中间深度值,并基于该中间深度值和待处理图像,得到目标对象在相机坐标系中的目标深度值,从而将二维检测框和三维检测框之间的投影关系信息作为特征信息,利用该投影关系信息对目标对象的深度进行预测,提升最终所得到的目标对象在相机坐标系中的目标深度值的精度。
在上述S101中,待处理图像在不同的应用场景下,所包括的目标对象不同。例如在将本公开实施例提供的深度检测方法应用于自动驾驶场景下时,目标对象例如包括车辆、行人、道路中的障碍物等;在将该深度检测方法应用于物体识别领域时,目标对象例如包括要识别的物体;在将该深度检测方法应用于相机定位领域时,目标对象例如包括目标场景中的各种物体。
本公开实施例以将该深度检测方法应用于自动驾驶场景为例,对深度检测方法的详细过程加以举例说明。
在上述S102中,待处理图像对应的图像坐标系,例如是以待处理图像中的左上角的像素点所在位置为原点建立的二维坐标系。待处理图像上的各个像素点在待处理图像中的位置均能够利用该图像坐标系中的坐标值来表征。
参见图2所示,本公开实施例提供一种确定目标对象的二维检测框在待处理图像对应的图像坐标系中的二维位置信息的方式,包括:
S201:对所述待处理图像进行特征提取,获取待处理图像的特征图。
此处,例如可以利用骨干神经网络对待处理图像进行特征提取,得到特征图;其中,对待处理图像进行特征提取的过程,即对待处理图像进行下采样的过程,也即按照一定的下采样率对待处理图像进行下采样,得到待处理图像的特征图;在下采样率为R的情况下,得到的待处理图像的尺寸与特征图的尺寸的比值为R。
在对待处理图像进行特征提取的过程中,例如可以对待处理图像进行至少一级卷积处理,得到待处理图像的特征图。在每级卷积处理过程中,例如可以利用至少一个卷积核对上一级卷积处理输出的结果或者待处理图像进行卷积,得到与本级卷积处理对应的结果,并将最后一级卷积处理的结果作为待处理图像的特征图。
在待处理图像的特征图中,每一个特征点与待处理图像中的像素点具有位置映射关系。
S202:基于所述特征图,得到所述特征图中的每个特征点属于目标对象的中心点的概率、与各个特征点对应的第一位置偏移量、和以各个特征点为中心点的下采样二维检测框的下采样尺寸信息。
其中,下采样二维检测框,为对待处理图像进行下采样后,所述目标对象二维检测框产生限缩形成的检测框。
在实施中,第一方面:在基于特征图得到特征图中的每个特征点属于目标对象的中心点的概率时,例如可以利用预先训练的中心点预测神经网络对特征图进行中心点预测处理,得到特征图中的各个特征点属于目标对象的中心点的概率。
此处,中心点预测神经网络例如可以是骨干神经网络延伸的一个分支;也即中心点预测神经网络与骨干神经网络属于同一神经网络;骨干神经网络对待处理图像进行特征提取后,将待处理图像对应的特征图传输给中心点预测神经网络;中心点预测神经网络基于特征图,预测特征图中的各个特征点属于目标对象的中心点的概率。
此处,中心点预测神经网络例如可以利用下述方式训练得到:获取样本图像,以及样本对象的中心点在所述样本图像中的标注位置信息;其中,所述样本对象的中心点为样本对象在所述样本图像对应的相机坐标系中的三维检测框的中心点在所述样本图像中的投影点;利用所述样本图像、以及所述位置标注信息,对待训练的骨干神经网络、以及待训练的中心点预测神经网络进行训练,得到训练好的所述中心点预测神经网络。
此处,在中心点预测神经网络是骨干神经网络延伸的一个分支的情况下,可以将骨干神经网络和待训练的中心点预测神经网络一起进行训练。
第二方面:假设神经网络输出的特征图中,任一特征点在特征图中坐标值为:(x1,y1),其物理含义为:物体的在图像中的投影点在图像中的位置,经过下采样、以及进行下取整后得到的坐标。则(x1,y1)和第一位置偏移量相加,得到的坐标值,为二维检测框的中心在经过下采样后得到的坐标。
在基于特征图,确定特征图中的各个特征点对应的第一位置偏移量时,例如可以利用预先训练的第一位置偏移量预测神经网络对各个特征点对应的第一位置偏移量进行预测。
此处,第一位置偏移量预测神经网络例如也可以是骨干神经网络延伸的一个分支,其与上述第一方面中的中心点预测神经网络分别为骨干神经网络延伸的不同分支。
在一些实现方式中,待处理图像的中某个像素点的位置p1、和特征图中某个特征点的位置p2满足下述公式(1):
Figure PCTCN2021125278-appb-000001
floor(·)表示下取整;R表示下采样率。可见,在
Figure PCTCN2021125278-appb-000002
并非整数的情况下,特征图中的特征点可能无法与待处理图像中的像素点进行像素级别的位置匹配,而是与待处理图像中的像素点进行亚像素级别的位置匹配关系。
此时,与特征图中特征点对应的第一位置偏移量D offset满足下述公式(2):
Figure PCTCN2021125278-appb-000003
其中,mod(·)表示取余数。
则在D offset通过位置偏移量预测神经网络预测得到后,即可以基于上述公式(2),得到各个特征点分别在待处理图像中对应的第一像素点(或者亚像素点)。
在训练第一位置偏移量预测神经网络的时候,例如在上述第一方面中,对中心点预测神经网络进行训练的过程中,已经得到了样本图像,以及样本图像对应的标注图像。
可以基于上述第一方面中样本对象的中心点在样本图像中的标注位置信息、以及上述公式(2),得到对待处理图像进行了下采样后,二维检测框在待处理图像中的中心点、与对应的特征点之间的第一偏移量;将得到的该第一偏移量作为样本图像的第一偏移量标注信息,利用样本图像对待训练的第一偏移量预测神经网络进行训练,得到训练后的第一偏移量预测神经网络。
这里,在第一偏移量预测神经网络为骨干神经网络延伸的一个分支的情况下,例如可以利用上述样本图像、和对应的第一偏移量标注信息,对待训练的骨干神经网络和待训练的第一偏移量预测神经网络进行训练,得到训练后的第一偏移量预测神经网络。
第三方面:在基于特征图,确定以特征图中的每个特征点为中心点的下采样二维检测框的下采样尺寸信息时,例如可以利用预先训练的二维检测框预测神经网络对待处理图像进行检测框预测处理,得到特征图中各个特征点分别对应的下采样二维检测框的下采样尺寸信息。
此处,二维检测框预测神经网络例如也可以作为骨干神经网络延伸的一个分支。
这里,由于下采样检测框,可以视作是利用下采样率对目标对象在待处理图像中的二维检测框进行限缩后所形成的检测框,因此,目标对象在待处理图像中的二维检测框的尺寸s1,与目标对象在特征图中的下采样二维检测框的尺寸s2满足下述公式(3):
Figure PCTCN2021125278-appb-000004
因此,在预测得到待处理图像中各个特征点分别对应的下采样二维检测框的下采样尺寸信息后,即可基于上述公式(3)得到待处理图像中二维检测框在待处理图像中的尺寸信息。
在一种可能的实施方式中,在对待训练的二维检测框预测神经网络进行训练时,例如可以采用下述方式:
获取样本图像、以及所述样本图像对应的二维检测框标注信息;其中,所述二维检测框标注信息基于样本对象在所述样本图像对应的相机坐标系中的三维检测框在所述样本图像中的投影生成;利用所述样本图像、以及所述样本图像对应的二维检测框标注信息,对待训练的骨干神经网络、以及待训练的二维检测框预测神经网络进行训练,得到训练后的二维检测框预测神经网络。
在本公开实施例中,是利用了二维检测框和三维检测框之间的投影关系作为特征数据,使得最终确定的目标对象在待处理图像对应的相机坐标系中的目标深度值,能够具有更高的置信度,但在图像中标注的真实二维检测框和基于三维检测框投影形成的二维检测框之间是具有一定差异的,这个差异 会导致基于真实二维检测框和真实标注的三维检测框在生成两者投影关系时,投影关系会存在一定的误差。因此,本公开实施例中利用样本对象在所述样本图像对应的相机坐标系中的三维检测框,在所述样本图像中的投影生成二维检测框标注信息,以消除这种差异。
这里需要注意的是,由于上述第一方面中的中心点预测神经网络、第二方面中的第一位置偏移量预测神经网络、以及第三方面中的二维检测框预测神经网络均可以是骨干神经网络的一个分支,因此,可以采用同一批样本图像,同步训练上述骨干神经网络、中心点预测神经网络、第一位置偏移量预测神经网络、二维检测框预测神经网络。另外,可以采用不同的样本图像,分别训练上述三个不同的分支。
S203:基于所述概率、所述第一位置偏移量以及所述下采样尺寸信息,得到所述二维位置信息。
在实施中,目标对象在图像坐标系中的二维位置信息包括:所述二维检测框的中心点在所述图像坐标系中的第一坐标信息(2D Center)、以及所述二维检测框的尺寸信息(2D Size)。
在实施中,在基于所述概率、所述第一位置偏移量以及所述下采样尺寸信息,得到所述二维位置信息时,例如可以采用下述方式:
基于所述特征图中的每个特征点属于目标对象的中心点的概率,从所述特征图中确定目标特征点;
基于所述目标特征点在所述特征图中的位置信息、所述目标特征点的第一位置偏移量、以及下采样率,确定所述二维检测框的中心点在所述图像坐标系中的第一坐标信息;
以及,基于所述目标特征点对应的下采样尺寸信息、以及所述下采样率,确定所述二维检测框的尺寸信息。
在实施中,在从基于特征图中的每个特征点属于目标对象中心点的概率,从特征图中确定目标特征点时,例如可以将各个特征点对应的概率分别和预设的概率阈值进行比较;在某个特征点对应的概率大于预设的概率阈值的情况下,将该特征点作为目标特征点。
针对特征图中的任一特征点,其在特征图中的位置信息表示为(x,y),与其对应的待处理图像中的像素点在待处理图像中的位置信息表示为(x′,y′),(x,y)和(x′,y′)之间的位置关系满足下述公式(4):
Figure PCTCN2021125278-appb-000005
D x offset表示该特征点与对应的第一像素点在图像坐标系X轴方向的第一位置偏移量。D y offset表示该特征点与对应的第一像素点在图像坐标系Y轴方向的第一位置偏移量。
因此,该特征点为目标特征点的情况下,也即与该特征点(x,y)对应的像素点为目标对象的二维检测框的第一中心点。
此时,第一中心点(x′,y′)的坐标值满足下述公式(5):
Figure PCTCN2021125278-appb-000006
进而,在目标特征点确定了,且目标特征点的第一位置偏移量已经基于上述S202预测得知的情况下,可以利用上述公式(5)得到目标对象的二维检测框的中心点在所述图像坐标系中的第一坐标信息。
在基于所述目标特征点对应的下采样尺寸信息、以及所述下采样率,确定所述二维检测框的尺寸 信息时,在目标特征点对应的下采样尺寸信息已经基于上述S202预测得到的情况下,可以基于上述公式(3)到目标对象的在待处理图像中二维检测框在待处理图像中的尺寸信息。
B:待处理图像对应的相机坐标系,例如是以拍摄待处理图像的相机的光轴为z轴,以相机的光心所在、且垂直于相机的光轴的平面为X轴和Y轴所在平面建立的三维坐标系。其中,Z轴方向称深度方向。
参见图3所示,本公开实施例还提供一种基于所述待处理图像,所述目标对象在所述待处理图像对应的相机坐标系中的三维检测框在图像坐标系中的投影位置信息的方法,包括:
S301:基于所述待处理图像的特征图,得到与所述特征图中的每个特征点对应的第二位置偏移量。
其中,假设神经网络输出的特征图中,任一特征点在特征图中坐标值为:(x1,y1),其物理含义为:物体的在图像中的投影点在图像中的位置,经过下采样、以及进行下取整后得到的坐标。
则(x1,y1)和第二位置偏移量相加,得到的坐标值,为物体的三维中心投影在在图像上形成投影点,并对投影点进行下采样后得到的坐标。
其中,各个特征点对应的第二位置偏移量,用于表征各个特征点和与各个特征点对应的第二像素点在经过下采样后形成的位置偏移;所述第二像素点为所述三维检测框的中心点在所述待处理图像中的投影点在所述待处理图像中对应的像素点。
待处理图像的特征图的获取方式与上述S201中特征图的获取方式相同。
在基于待处理图像的特诊图,得到与特征图中的每个特征点对应的第二位置偏移量时,例如可以利用预先训练的第二位置偏移量预测神经网络,对特征图进行第二位置偏移量预测处理,得到特征图中的各个特征点分别对应的第二位置偏移量。
这里,第二位置偏移量预测神经网络例如也可以是骨干神经网络的延伸的分支网络。将待处理图像输入至骨干神经网络;骨干神经网络对待处理图像进行下采样,得到待处理图像的特征图;特征图进入到第二位置偏移量预测神经网络后,得到特征图中各个特征点分别对应的第二位置偏移量。
这里,在训练第二位置偏移量预测神经网络的时候,例如在上述(1)中,对中心点预测神经网络进行训练的过程中,已经得到了样本图像,以及样本图像对应的标注图像。
例如可以对样本图像分别标注二维标注框、并标注三维标注框,然后基于标注的三维标注框,得到标注的三维标注框的中心点在待处理图像中的投影点在待处理图像中的坐标值s1;标注的二维标注框的中心点在待处理图像中的坐标值为s1’。
然后将标注的二维标注框的中心点在待处理图像中的坐标值s1’,利用上述公式(1)得到与s1对应的特征点,在特征图中的位置s2。
然后将标注的三维标注框的中心点在待处理图像中的投影点在待处理图像中的坐标值s1、以及利用公式(1)得到的s2,代入到上述公式(2),即得样本对象的中心在样本对象对应的特征图中的特征点、与对应样本对象中心在样本图像中的中心点经过下采样后形成的位置偏移。
可以基于上述(1)中样本对象的中心点在样本图像中的标注位置信息,得到对待处理图像进行了下采样后,样本对象的三维检测框在待处理图像中的投影的中心点,的中心点、与对应的特征点之间的第一偏移量;将得到的该第一偏移量作为样本图像的第一偏移量标注信息,利用样本图像对待训练的第一偏移量预测神经网络进行训练,得到训练后的第一偏移量预测神经网络。
S302:基于所述特征图中的每个特征点属于目标对象的中心点的概率、所述第二位置偏移量、以及下采样率,得到所述三维检测框在所述图像坐标系中的投影位置信息。
此处,所述投影位置信息包括下述至少一种:所述三维检测框的中心点在所述图像坐标系中投影点的第二坐标信息。
示例性的,可以采用下述方式得到所述三维检测框在所述图像坐标系中的投影位置信息:
基于所述特征图中的每个特征点属于目标对象的中心点的概率,从所述特征图中,确定目标特征点;
基于所述目标特征点在所述特征图中的位置信息、所述目标特征点对应的第二位置偏移量、以及所述下采样率,确定所述三维检测框的中心点在所述图像坐标系中投影点的第二坐标信息。
这里,确定目标特征点的方式,与上述S203中确定目标特征点的方式相似。
在确定了目标特征点后,例如可以将目标特征点在特征图中的位置信息、目标特征点对应的第二位置偏移量、以及所述下采样率,代入到上述公式(5)中,得到三维检测框的中心点在所述图像坐标系中投影点的第二坐标信息。
针对上述S103:在基于上述S102中的二维位置信息、投影位置信息、以及所述二维检测框和所述三维检测框之间的投影关系信息,得到所述目标对象在所述相机坐标系中的中间深度值时,例如可以采用下述方式:
基于所述二维位置信息、所述投影位置信息、所述目标对象的实际尺寸信息、所述目标对象的朝向信息、以及所述二维检测框和所述三维检测框之间的投影关系信息,得到所述目标对象在所述相机坐标系中的中间深度值。
在该种实施方式中,本公开实施例提供的深度检测方法还包括:
基于所述待处理图像的特征图,对所述目标对象进行尺寸预测处理,得到所述目标对象的实际尺寸信息;
和/或,基于所述待处理图像的特征图,对所述目标对象进行朝向预测处理,得到所述目标对象在所述相机坐标系中的朝向信息。
本公开实施例中,例如可以利用预先训练的尺寸预测神经网络,对待处理图像的特征图进行尺寸预测处理,得到目标对象的实际尺寸信息。此处,目标对象的实际尺寸信息,例如为目标对象在待处理图像对应的相机坐标系中的三维包围框的尺寸信息。
另外,也可以利用预先训练的朝向预测神经网络对待处理图像的特征图进行朝向预测处理,得到目标对象在相机坐标系中的朝向信息。
此处,尺寸预测神经网络、以及朝向预测神经网络可以为骨干申请网络延伸的不同分支。其可以与上述实施例中所述的中心点预测神经网络、第一位置偏移量预测神经网络、二维检测框预测神经网络、第二位置偏移量预测神经网络、以及骨干神经网络进行同步训练。
在本公开实施例中,还包括:建立二维检测框和所述三维检测框之间的投影关系信息。
示例性的,二维检测框和三维检测框的投影关系信息,是基于所述三维检测框在图像坐标系中的投影的尺寸信息和位置信息、与所述二维检测框的尺寸信息和位置信息建立的。例如可以采用下述方式建立二维检测框和三维检测框之间的投影关系信息:
在相机坐标系中,任一目标对象的三维包围框被表示为一个七元组:(W、H、L、x、y、z、r y);其中,W、H、L分别表示三维包围框的长度、宽度、以及高度;(x,y,z)表示三维包围框的中心点坐标;r y表示目标对象在相机坐标系中绕Y周旋转的角度,范围为[-π,π]。任一目标对象在对应图像坐标系中的二维包围框被表示为一个四元组:(w,h,u,v);其中,w,h表示二维包围框的宽度和高度,(u,v)表示二维包围框的中心点在图像坐标系中的坐标值。
三维包围框的第c个角(c=1,…,8)的在相机坐标系中的坐标记为
Figure PCTCN2021125278-appb-000007
其中,
Figure PCTCN2021125278-appb-000008
满足下述公式(6):
Figure PCTCN2021125278-appb-000009
其中:
Figure PCTCN2021125278-appb-000010
满足下述公式(7):
Figure PCTCN2021125278-appb-000011
Figure PCTCN2021125278-appb-000012
分别表示三维包围框的角点与三维包围框的中心点在相机坐标系的X、Y、Z方向上的坐标差,i∈{1,2},表示不同的Δ值的正负。则三维包围框的第c个角在相机坐标系中的坐标表示为下述公式(8):
Figure PCTCN2021125278-appb-000013
其中,Ρ obj表示三维包围框的中心点做包在相机坐标系中的坐标值;
Figure PCTCN2021125278-appb-000014
表示三维包围框的角点在相机坐标系中的坐标值。
基于相机的内参矩阵,可以将角点从相机坐标系中投影到图像坐标系中,角点在图像坐标系中的投影点的坐标
Figure PCTCN2021125278-appb-000015
满足下述公式(9):
Figure PCTCN2021125278-appb-000016
其中,z c表示第c个角点在相机坐标系中的深度值,u c,v c分表表示第c个角点在图像坐标系中的投影点在图像坐标系中的x轴的坐标值、以及y轴的坐标值。
在给定了目标对象的三维包围框在相机坐标系中的8个角点后,可以基于图像坐标系中的最上角max c{v c}、与最下角min c{v c}之间的垂直距离,估算得到二维包围框的投影高度h,满足下述公式(10):
Figure PCTCN2021125278-appb-000017
v c来源于上述公式(9),
Figure PCTCN2021125278-appb-000018
表示三维包围框中各个角点与中心点点的最大深度差值;z表示中心点点的深度值;Δy max表示三维包围框中各个角点与中心点在相机坐标系的Y轴上的坐标差的最大值;Δy min表示三维包围框中各个角点与中心点点之间在Y周上的坐标差的最小值;f v表示相机的焦距。
三维包围框的中心点与水平面的夹角β满足下述公式(11):
Figure PCTCN2021125278-appb-000019
其中,(u o,v o)表示三维包围框的中心点在图像坐标系中投影点在图像坐标系中的坐标值。c v表示相机的主点偏移。
结合上述公式(10)和公式(11),三维包围框的中心点在相机坐标系中的深度z满足下述公式(12):
Figure PCTCN2021125278-appb-000020
其中,参量b满足(13):
Figure PCTCN2021125278-appb-000021
tanβ满足下述公式(14):
Figure PCTCN2021125278-appb-000022
进而,在确定上述公式(12)、(13)、以及(14)的参量的情况下,可以确定三维包围框的中心点的深度值。
上述公式(12)即为本公开实施例中所述二维包围框和三维包围框之间的投影关系信息。
在将上述公式(12)、(13)以及(14)作为投影关系信息用于本公开实施例提供的深度检测方法中时,f v表示相机的焦距,可以基于待处理图像的属性信息读取得到;h表示目标对象在图像坐标系中的二维检测框的高度,可以根据上述二维位置信息得到,也即基于上述二维检测框尺寸信息得到。
Δz max表示目标对象的三维检测框的8个角点与三维检测框中心点深度之间深度差的最大值。其中,三维检测框的8个角点中的第c个角点与三维检测框的中心点的深度差Δz c满足下述公式(15):
Figure PCTCN2021125278-appb-000023
其中,L和W分别来源于目标对象的实际尺寸信息,分别表示目标对象的高度和宽度。r y为目标对象的朝向信息。
基于目标对象的实际尺寸信息中的宽度值W和长度值L、以及目标对象的朝向信息、以及上述公式(15),计算目标对象的三维检测框的8个角点分别与三维检测框的中心点的深度差,然后将然后取8个角点分别与三维检测框的中心点之间的深度差的最大值,即Δz max
然后利用上述公式(14),确定三维包围框的中心点与水平面的夹角β的正切值即tan(β)。
然后,利用目标对象的实际尺寸信息中的高度值H,tan(β)、Δz max、相机的焦距、以及待处理 图像对应的二维检测框的高度h,代入到公式(12)、和(13),得到目标对象的中心点的中间深度值。
针对S104:在得到目标对象的中心点在相机坐标系中的深度值后,例如可以采用下述方式得到所述目标对象的中心点在所述相机坐标系中的目标深度值:
对所述目标对象的中心点在所述相机坐标系中的中间深度值构成的深度图像进行非线性变换,得到深度特征图;
基于所述深度特征图、以及所述待处理图像的特征图,得到所述目标对象的中心点在所述相机坐标系中的目标深度值。
在实施中,对所述目标对象的中心点在所述相机坐标系中的中间深度值构成的深度图像进行非线性变换,得到深度特征图,其目的是为了去除深度特征图的噪声,进而能够将深度特征图作为待处理图像的特征的一部分,将深度特征图和待处理图像的特征图叠加起来,构成待处理图像对应的目标特征图,然后利用预先训练的深度值预测神经网络对所述目标特征图进行深度预测处理,得到所述特征图中各个特征点的目标深度值;
基于所述特征图中各个特征点属于目标对象的中心点的概率、以及所述各个特征点分别对应的目标深度值,得到所述目标对象的中心点在所述相机坐标系中的目标深度值。
此处,对目标对象的中心点在所述相机坐标系中的中间深度值构成的深度图像进行非线性变换时,例如可以利用非线性变换模块,对目标对象的中心点在所述相机坐标系中的中间深度值进行非线性变换,以得到深度特征图。
这样,利用二维检测框和三维检测框之间的投影关系信息,生成能够对深度预测进行限制的深度特征图,然后利用深度特征图作为深度预测的特征数据,将之与待处理图像的特征图叠加后,得到待处理图像的目标特征图,然后利用深度预测神经网络对目标特征图进行深度预测处理,得到的目标对象的中心点的深度值,具有更高的置信度和准确度。
本公开另一实施例中,还包括:基于所述目标对象的中心点在所述相机坐标系中的目标深度值、以及所述目标对象的实际尺寸信息,得到所述目标对象在所述相机坐标系中的三维检测结果。
这样,可以基于三维检测结果进行后续的处理,例如在将本公开实施例应用于自动驾驶领域时,可以基于三维检测结果,控制自动驾驶车辆的自动驾驶过程。
参见图4所示,本公开实施例提供一种利用目标神经网络对待处理图像进行处理,得到目标对象在待处理图像对应的相机坐标系中的深度值的示例。包括:
该目标神经网络包括:骨干神经网络401、与骨干神经网络分别连接的中心点预测神经网络402、第一位置偏移量预测神经网络403、二维检测框预测神经网络404、第二位置偏移量预测神经网络405、尺寸预测神经网络406、朝向预测神经网络407。
将待处理图像输入至骨干神经网络401,得到特征图(Feature map)。
将特征图输入到中心点预测神经网络402,得到热图(Heatmap),其中,热图中各个像素点的像素值,表征与该像素点对应的特征图中的特征点属于目标对象的中心点的概率。
将特征图输入到第一位置偏移量预测神经网络403,得到每个特征点对应的第一位置偏移量(2D offset)。
将特征图输入到二维检测框预测神经网络404,得到以各个特征点为中心点的下采样二维检测框的下采样尺寸信息即二维检测框尺寸信息。
将特征图输入到第二位置偏移量预测神经网络405,得到特征图中的每个特征点对应的第二位置偏移量(3D offset)。
将特征图输入到尺寸预测神经网络406,得到目标对象在图像坐标系中的实际尺寸信息(3D dimension)。
将特征图输入到朝向预测神经网络407,得到目标对象的朝向信息(Orientation)。
在目标神经网络中,还包括:与中心点预测神经网络402、第一位置偏移量预测神经网络403、 二维检测框预测神经网络404连接的第一处理模块408。
热图、第一位置偏移量、以及二维检测框尺寸信息进入到第一处理模块408,第一处理模块408利用热图、第一位置偏移量、以及二维检测框尺寸信息,生成目标对象的二维检测框在所述待处理图像对应的图像坐标系中的二维位置信息。
在目标神经网络中,还包括:与中心点预测神经网络402、第二位置偏移量预测神经网络405连接的第二处理模块409。
热图、第二位置偏移量进入到第二处理模块409,第二处理模块利用热图、第二位置偏移量,生成目标对象在所述待处理图像对应的相机坐标系中的三维检测框在所述图像坐标系中的投影位置信息投影位置信息。
在目标神经网络中,还包括:与第一处理模块408、第二处理模块409、尺寸预测神经网络406、朝向预测神经网络407连接的第三处理模块410。
二维位置信息、投影位置信息、实际尺寸信息、朝向信息被输入至第三处理模块410,第三处理模块410基于二维检测框和三维检测框之间的投影关系信息(也即上述公式(12)、(13)、和(14)),利用二维位置信息、投影位置信息、实际尺寸信息、朝向信息,得到目标对象的中心点在所述相机坐标系中的中间深度值所构成的深度图(Depth map)。
在目标神经网络中,还包括:与第三处理模块410连接的非线性变换模块411。
深度图进入到非线性变换模块411,非线性变换模块411对深度图进行非线性变换,得到深度特征图(Geometric map)。
在目标神经网络中,还包括:与骨干网络401和非线性变换模块411连接的第四处理模块412。
深度特征图和特征图输入至第四处理模块412,第四处理模块412对深度特征图和特征图进行叠加处理,得到待处理图像的目标特征图。
在目标神经网络中,还包括:与第四处理模块412连接的深度预测神经网络413。
将目标特征图输入至深度预测神经网络413,深度预测神经网络413对目标特征图进行深度预测处理,得到目标对象的中心点在相机坐标系中的目标深度值。
通过上述目标神经网络,能够得到待处理图像的中心点在相机坐标系中的目标深度值。
本领域技术人员可以理解,在具体实施方式的上述方法中,各步骤的撰写顺序并不意味着严格的执行顺序而对实施过程构成任何限定,各步骤的执行顺序应当以其功能和可能的内在逻辑确定。
基于同一发明构思,本公开实施例中还提供了与深度检测方法对应的深度检测装置,由于本公开实施例中的装置解决问题的原理与本公开实施例上述深度检测方法相似,因此装置的实施可以参见方法的实施。
参照图5所示,为本公开实施例提供的一种深度检测装置的示意图,所述装置包括:
获取模块51,配置为获取待处理图像;
第一处理模块52,配置为基于所述待处理图像,确定目标对象的二维检测框在所述待处理图像对应的图像坐标系中的二维位置信息、以及所述目标对象在所述待处理图像对应的相机坐标系中的三维检测框在所述图像坐标系中的投影位置信息;
第二处理模块53,配置为基于所述二维位置信息、所述投影位置信息、以及所述二维检测框和所述三维检测框之间的投影关系信息,得到所述目标对象的中心点在所述相机坐标系中的中间深度值;
预测模块54,配置为基于所述中间深度值和所述待处理图像,得到所述目标对象的中心点在所述相机坐标系中的目标深度值。
一种可能的实施方式中,所述第一处理模块52,在基于所述待处理图像,确定目标对象的二维检测框在所述待处理图像对应的图像坐标系中的二维位置信息时,配置为:
对所述待处理图像进行特征提取,获取待处理图像的特征图;
基于所述特征图,得到所述特征图中的每个特征点属于目标对象的中心点的概率、与各个特征点对应的第一位置偏移量、和以各个特征点为中心点的下采样二维检测框的下采样尺寸信息;
基于所述概率、所述第一位置偏移量以及所述下采样尺寸信息,得到所述二维位置信息;
其中,下采样二维检测框,为对待处理图像进行下采样后,所述目标对象二维检测框产生限缩形成的检测框。
一种可能的实施方式中,所述二维位置信息包括:所述二维检测框的中心点在所述图像坐标系中的第一坐标信息、以及所述二维检测框的尺寸信息。
一种可能的实施方式中,所述第一处理模块52,在基于所述概率、所述第一位置偏移量以及所述下采样尺寸信息,得到所述二维位置信息时,配置为:
基于所述特征图中的每个特征点属于目标对象的中心点的概率,从所述特征图中确定目标特征点;
基于所述目标特征点在所述特征图中的位置信息、所述目标特征点的第一位置偏移量、以及下采样率,确定所述二维检测框的中心点在所述图像坐标系中的第一坐标信息;
以及,
基于所述目标特征点对应的下采样尺寸信息、以及所述下采样率,确定所述二维检测框的尺寸信息。
一种可能的实施方式中,所述第一处理模块52,在对所述待处理图像进行特征提取,获取待处理图像的特征图时,配置为:
利用预先训练的骨干神经网络对所述待处理图像进行特征提取,得到所述待处理图像的特征图;
一种可能的实施方式中,所述第一处理模块52,在基于所述特征图,得到所述特征图中的每个特征点属于目标对象的中心点的概率时,配置为:
利用预先训练的中心点预测神经网络对特征图进行中心点预测处理,得到特征图中的各个特征点属于目标对象的中心点的概率。
一种可能的实施方式中,还包括训练模块55,配置为采用下述方式训练所述中心点预测神经网络:
获取样本图像,以及样本对象的中心点在所述样本图像中的标注位置信息;其中,所述样本对象的中心点为样本对象在所述样本图像对应的相机坐标系中的三维检测框的中心点在所述样本图像中的投影点;
利用所述样本图像、以及所述位置标注信息,对待训练的骨干神经网络、以及待训练的中心点预测神经网络进行训练,得到训练好的所述中心点预测神经网络。
一种可能的实施方式中,所述第一处理模块52,在基于所述待处理图像,所述目标对象在所述待处理图像对应的相机坐标系中的三维检测框在所述图像坐标系中的投影位置信息时,配置为:
基于所述待处理图像的特征图,得到与所述特征图中的每个特征点对应的第二位置偏移量;
基于所述特征图中的每个特征点属于目标对象的中心点的概率、所述第二位置偏移量、以及下采样率,得到所述三维检测框在所述图像坐标系中的投影位置信息。
一种可能的实施方式中,所述第一处理模块52,在投影位置信息包括下述至少一种:所述三维检测框的中心点在所述图像坐标系中投影点的第二坐标信息。
一种可能的实施方式中,所述第一处理模块52,在基于所述特征图中的每个特征点属于目标对象的中心点的概率、所述第二位置偏移量、以及下采样率,得到所述三维检测框在所述图像坐标系中的投影位置信息时,配置为:
基于所述特征图中的每个特征点属于目标对象的中心点的概率,从所述特征图中,确定目标特征点;
基于所述目标特征点在所述特征图中的位置信息、所述目标特征点对应的第二位置偏移量、以及所述下采样率,确定所述三维检测框的中心点在所述图像坐标系中投影点的第二坐标信息。
一种可能的实施方式中,所述第二处理模块53,在基于所述二维位置信息、所述投影位置信息、以及所述二维检测框和所述三维检测框之间的投影关系信息,得到所述目标对象在所述相机坐标系中的中间深度值时,配置为:
基于所述二维位置信息、所述投影位置信息、所述目标对象的实际尺寸信息、所述目标对象的朝向信息、以及所述二维检测框和所述三维检测框之间的投影关系信息,得到所述目标对象在所述相机坐标系中的中间深度值。
一种可能的实施方式中,所述第一处理模块52,还配置为:
基于所述待处理图像的特征图,对所述目标对象进行尺寸预测处理,得到所述目标对象的实际尺寸信息;
和/或,基于所述待处理图像的特征图,对所述目标对象进行朝向预测处理,得到所述目标对象在所述相机坐标系中的朝向信息。
一种可能的实施方式中,所述二维检测框和三维检测框的投影关系信息,是基于所述三维检测框在图像坐标系中的投影的尺寸信息和位置信息、与所述二维检测框的尺寸信息和位置信息建立的。
一种可能的实施方式中,所述预测模块54,在所述基于所述中间深度值,得到所述目标对象的中心点在所述相机坐标系中的目标深度值时,配置为:
对所述目标对象的中心点在所述相机坐标系中的中间深度值构成的深度图像进行非线性变换,得到深度特征图;
基于所述深度特征图、以及所述待处理图像的特征图,得到所述目标对象的中心点在所述相机坐标系中的目标深度值。
一种可能的实施方式中,所述预测模块54,在基于所述深度特征图、以及所述待处理图像的特征图,得到所述目标对象的中心点在所述相机坐标系中的目标深度值时,配置为:
将所述深度特征图、以及所述待处理图像的特征图进行叠加,形成目标特征图;
利用预先训练的深度值预测神经网络对所述目标特征图进行深度预测处理,得到所述特征图中各个特征点的目标深度值;
基于所述特征图中各个特征点属于目标对象的中心点的概率、以及所述各个特征点分别对应的目标深度值,得到所述目标对象的中心点在所述相机坐标系中的目标深度值。
一种可能的实施方式中,还包括第三处理模块56,配置为基于所述目标对象的中心点在所述相机坐标系中的目标深度值、以及所述目标对象的实际尺寸信息,得到所述目标对象在所述相机坐标系中的三维检测结果。
关于装置中的各模块的处理流程、以及各模块之间的交互流程的描述可以参照上述方法实施例中的相关说明,这里不再详述。
本公开实施例还提供了一种计算机设备,如图6所示,为本公开实施例提供的计算机设备结构示意图,包括:
处理器61和存储器62;所述存储器62存储有处理器61可执行的机器可读指令,处理器61用于执行存储器62中存储的机器可读指令,所述机器可读指令被处理器61执行时,处理器61执行下述步骤:
获取待处理图像;
基于所述待处理图像,确定目标对象的二维检测框在所述待处理图像对应的图像坐标系中的二维位置信息、以及所述目标对象在所述待处理图像对应的相机坐标系中的三维检测框在所述图像坐标系中的投影位置信息;
基于所述二维位置信息、所述投影位置信息、以及所述二维检测框和所述三维检测框之间的投影关系信息,得到所述目标对象的中心点在所述相机坐标系中的中间深度值;
基于所述中间深度值,得到所述目标对象的中心点在所述相机坐标系中的目标深度值。
上述存储器62包括内存621和外部存储器622;这里的内存621也称内存储器,用于暂时存放处理器61中的运算数据,以及与硬盘等外部存储器622交换的数据,处理器61通过内存621与外部存储器622进行数据交换。
上述指令的执行过程可以参考本公开实施例中所述的深度检测方法的步骤。
本公开实施例还提供一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,该 计算机程序被处理器运行时执行上述方法实施例中所述的深度检测方法的步骤。其中,该存储介质可以是易失性或非易失的计算机可读取存储介质。
本公开实施例还提供一种计算机程序产品,该计算机程序产品承载有程序代码,所述程序代码包括的指令可用于执行上述方法实施例中所述的深度检测方法的步骤,具体可参见上述方法实施例,在此不再赘述。
其中,上述计算机程序产品可以具体通过硬件、软件或其结合的方式实现。在一个可选实施例中,所述计算机程序产品具体体现为计算机存储介质,在另一个可选实施例中,计算机程序产品具体体现为软件产品,例如软件开发包(Software Development Kit,SDK)等等。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统和装置的具体工作过程,可以参考前述方法实施例中的对应过程。在本公开所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,又例如,多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些通信接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本公开各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个处理器可执行的非易失的计算机可读取存储介质中。基于这样的理解,本公开的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本公开各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
最后应说明的是:以上所述实施例,仅为本公开的具体实施方式,用以说明本公开的技术方案,而非对其限制,本公开的保护范围并不局限于此,尽管参照前述实施例对本公开进行了详细的说明,本领域的普通技术人员应当理解:任何熟悉本技术领域的技术人员在本公开揭露的技术范围内,其依然可以对前述实施例所记载的技术方案进行修改或可轻易想到变化,或者对其中部分技术特征进行等同替换;而这些修改、变化或者替换,并不使相应技术方案的本质脱离本公开实施例技术方案的精神和范围,都应涵盖在本公开的保护范围之内。因此,本公开的保护范围应所述以权利要求的保护范围为准。
工业实用性
本公开实施例中,获取待处理图像;基于所述待处理图像,确定目标对象的二维检测框在所述待处理图像对应的图像坐标系中的二维位置信息、以及所述目标对象在所述待处理图像对应的相机坐标系中的三维检测框在所述图像坐标系中的投影位置信息;基于所述二维位置信息、所述投影位置信息、以及所述二维检测框和所述三维检测框之间的投影关系信息,得到所述目标对象的中心点在所述相机坐标系中的中间深度值;基于所述中间深度值,得到所述目标对象的中心点在所述相机坐标系中的目标深度值。本公开实施例能够提升预测得到的目标对象在相机坐标系中深度信息的准确度。

Claims (34)

  1. 一种深度检测方法,包括:
    获取待处理图像;
    基于所述待处理图像,确定目标对象的二维检测框在所述待处理图像对应的图像坐标系中的二维位置信息、以及所述目标对象在所述待处理图像对应的相机坐标系中的三维检测框在所述图像坐标系中的投影位置信息;
    基于所述二维位置信息、所述投影位置信息、以及所述二维检测框和所述三维检测框之间的投影关系信息,得到所述目标对象的中心点在所述相机坐标系中的中间深度值;
    基于所述中间深度值和所述待处理图像,得到所述目标对象的中心点在所述相机坐标系中的目标深度值。
  2. 根据权利要求1所述的深度检测方法,其中,所述基于所述待处理图像,确定目标对象的二维检测框在所述待处理图像对应的图像坐标系中的二维位置信息,包括:
    对所述待处理图像进行特征提取,获取待处理图像的特征图;
    基于所述特征图,得到所述特征图中的各个特征点属于目标对象的中心点的概率、与各个特征点对应的第一位置偏移量、和以各特征点为中心点的下采样二维检测框的下采样尺寸信息;
    基于所述概率、所述第一位置偏移量以及所述下采样尺寸信息,得到所述二维位置信息;
    其中,下采样二维检测框,为对待处理图像进行下采样后,所述目标对象二维检测框产生限缩形成的检测框。
  3. 根据权利要求2所述的深度检测方法,其中,所述二维位置信息包括:所述二维检测框的中心点在所述图像坐标系中的第一坐标信息、以及所述二维检测框的尺寸信息。
  4. 根据权利要求3所述的深度检测方法,其中,所述基于所述概率、所述第一位置偏移量以及所述下采样尺寸信息,得到所述二维位置信息,包括:
    基于所述特征图中的每个特征点属于目标对象的中心点的概率,从所述特征图中确定目标特征点;
    基于所述目标特征点在所述特征图中的位置信息、所述目标特征点的第一位置偏移量、以及下采样率,确定所述二维检测框的中心点在所述图像坐标系中的第一坐标信息;
    以及,
    基于所述目标特征点对应的下采样尺寸信息、以及所述下采样率,确定所述二维检测框的尺寸信息。
  5. 根据权利要求2至4任一项所述的深度检测方法,其中,所述对所述待处理图像进行特征提取,获取待处理图像的特征图,包括:
    利用预先训练的骨干神经网络对所述待处理图像进行特征提取,得到所述待处理图像的特征图;
    所述基于所述特征图,得到所述特征图中的每个特征点属于目标对象的中心点的概率,包括:
    利用预先训练的中心点预测神经网络对特征图进行中心点预测处理,得到特征图中的各个特征点属于目标对象的中心点的概率。
  6. 根据权利要求5所述的深度检测方法,其中,采用下述方式训练所述中心点预测神经网络:
    获取样本图像,以及样本对象的中心点在所述样本图像中的标注位置信息;其中,所述样本对象的中心点为样本对象在所述样本图像对应的相机坐标系中的三维检测框的中心点在所述样本图像中的投影点;
    利用所述样本图像、以及所述位置标注信息,对待训练的骨干神经网络、以及待训练的中心点预测神经网络进行训练,得到训练好的所述中心点预测神经网络。
  7. 根据权利要求1至6任一项所述的深度检测方法,其中,所述基于所述待处理图像,确定所 述目标对象在所述待处理图像对应的相机坐标系中的三维检测框在所述图像坐标系中的投影位置信息,包括:
    基于所述待处理图像的特征图,得到与所述特征图中的每个特征点对应的第二位置偏移量;
    基于所述特征图中的每个特征点属于目标对象的中心点的概率、所述第二位置偏移量、以及下采样率,得到所述三维检测框在所述图像坐标系中的投影位置信息。
  8. 根据权利要求7所述的深度检测方法,其中,所述投影位置信息包括下述至少一种:所述三维检测框的中心点在所述图像坐标系中投影点的第二坐标信息。
  9. 根据权利要求8所述的深度检测方法,其中,所述基于所述特征图中的每个特征点属于目标对象的中心点的概率、所述第二位置偏移量、以及下采样率,得到所述三维检测框在所述图像坐标系中的投影位置信息,包括:
    基于所述特征图中的每个特征点属于目标对象的中心点的概率,从所述特征图中,确定目标特征点;
    基于所述目标特征点在所述特征图中的位置信息、所述目标特征点对应的第二位置偏移量、以及所述下采样率,确定所述三维检测框的中心点在所述图像坐标系中投影点的第二坐标信息。
  10. 根据权利要求1至9任一项所述的深度检测方法,其中,所述基于所述二维位置信息、所述投影位置信息、以及所述二维检测框和所述三维检测框之间的投影关系信息,得到所述目标对象在所述相机坐标系中的中间深度值,包括:
    基于所述二维位置信息、所述投影位置信息、所述目标对象的实际尺寸信息、所述目标对象的朝向信息、以及所述二维检测框和所述三维检测框之间的投影关系信息,得到所述目标对象在所述相机坐标系中的中间深度值。
  11. 根据权利要求10所述的深度检测方法,其中,还包括:
    基于所述待处理图像的特征图,对所述目标对象进行尺寸预测处理,得到所述目标对象的实际尺寸信息;
    和/或,基于所述待处理图像的特征图,对所述目标对象进行朝向预测处理,得到所述目标对象在所述相机坐标系中的朝向信息。
  12. 根据权利要求1至11任一项所述的深度检测方法,其中,所述二维检测框和三维检测框的投影关系信息,是基于所述三维检测框在图像坐标系中的投影的尺寸信息和位置信息、与所述二维检测框的尺寸信息和位置信息建立的。
  13. 根据权利要求1至12任一项所述的深度检测方法,其中,所述基于所述中间深度值,得到所述目标对象的中心点在所述相机坐标系中的目标深度值,包括:
    对所述目标对象的中心点在所述相机坐标系中的中间深度值构成的深度图像进行非线性变换,得到深度特征图;
    基于所述深度特征图、以及所述待处理图像的特征图,得到所述目标对象的中心点在所述相机坐标系中的目标深度值。
  14. 根据权利要求13所述的深度检测方法,其中,所述基于所述深度特征图、以及所述待处理图像的特征图,得到所述目标对象的中心点在所述相机坐标系中的目标深度值,包括:
    将所述深度特征图、以及所述待处理图像的特征图进行叠加,形成目标特征图;
    利用预先训练的深度值预测神经网络对所述目标特征图进行深度预测处理,得到所述特征图中各个特征点的目标深度值;
    基于所述特征图中各个特征点属于目标对象的中心点的概率、以及所述各个特征点分别对应的目标深度值,得到所述目标对象的中心点在所述相机坐标系中的目标深度值。
  15. 根据权利要求1至14任一项所述的深度检测方法,其中,还包括:基于所述目标对象的中心点在所述相机坐标系中的目标深度值、以及所述目标对象的实际尺寸信息,得到所述目标对象在所述相机坐标系中的三维检测结果。
  16. 一种深度检测装置,包括:
    获取模块,配置为获取待处理图像;
    第一处理模块,配置为基于所述待处理图像,确定目标对象的二维检测框在所述待处理图像对应的图像坐标系中的二维位置信息、以及所述目标对象在所述待处理图像对应的相机坐标系中的三维检测框在所述图像坐标系中的投影位置信息;
    第二处理模块,配置为基于所述二维位置信息、所述投影位置信息、以及所述二维检测框和所述三维检测框之间的投影关系信息,得到所述目标对象的中心点在所述相机坐标系中的中间深度值;
    预测模块,配置为基于所述中间深度值,得到所述目标对象的中心点在所述相机坐标系中的目标深度值。
  17. 根据权利要求16所述的装置,其中,所述第一处理模块,在基于所述待处理图像,确定目标对象的二维检测框在所述待处理图像对应的图像坐标系中的二维位置信息时,配置为:对所述待处理图像进行特征提取,获取待处理图像的特征图;基于所述特征图,得到所述特征图中的每个特征点属于目标对象的中心点的概率、与各个特征点对应的第一位置偏移量、和以各个特征点为中心点的下采样二维检测框的下采样尺寸信息;基于所述概率、所述第一位置偏移量以及所述下采样尺寸信息,得到所述二维位置信息;其中,下采样二维检测框,为对待处理图像进行下采样后,所述目标对象二维检测框产生限缩形成的检测框。
  18. 根据权利要求17所述的装置,其中,所述二维位置信息包括:所述二维检测框的中心点在所述图像坐标系中的第一坐标信息、以及所述二维检测框的尺寸信息。
  19. 根据权利要求18所述的装置,其中,所述所述第一处理模块,在基于所述概率、所述第一位置偏移量以及所述下采样尺寸信息,得到所述二维位置信息时,配置为:基于所述特征图中的每个特征点属于目标对象的中心点的概率,从所述特征图中确定目标特征点;基于所述目标特征点在所述特征图中的位置信息、所述目标特征点的第一位置偏移量、以及下采样率,确定所述二维检测框的中心点在所述图像坐标系中的第一坐标信息;以及,基于所述目标特征点对应的下采样尺寸信息、以及所述下采样率,确定所述二维检测框的尺寸信息。
  20. 根据权利要求17至19任一项所述的装置,其中,所述第一处理模块,在对所述待处理图像进行特征提取,获取待处理图像的特征图时,配置为:利用预先训练的骨干神经网络对所述待处理图像进行特征提取,得到所述待处理图像的特征图;所述第一处理模块,在基于所述特征图,得到所述特征图中的每个特征点属于目标对象的中心点的概率时,配置为:利用预先训练的中心点预测神经网络对特征图进行中心点预测处理,得到特征图中的各个特征点属于目标对象的中心点的概率。
  21. 根据权利要求20所述的装置,其中,还包括训练模块,配置为采用下述方式训练所述中心点预测神经网络:获取样本图像,以及样本对象的中心点在所述样本图像中的标注位置信息;其中,所述样本对象的中心点为样本对象在所述样本图像对应的相机坐标系中的三维检测框的中心点在所述样本图像中的投影点;利用所述样本图像、以及所述位置标注信息,对待训练的骨干神经网络、以及待训练的中心点预测神经网络进行训练,得到训练好的所述中心点预测神经网络。
  22. 根据权利要求16至20任一项所述的装置,其中,所述第一处理模块,在基于所述待处理图像,确定所述目标对象在所述待处理图像对应的相机坐标系中的三维检测框在所述图像坐标系中的投影位置信息时,配置为:基于所述待处理图像的特征图,得到与所述特征图中的每个特征点对应的第二位置偏移量;基于所述特征图中的每个特征点属于目标对象的中心点的概率、所述第二位置偏移量、以及下采样率,得到所述三维检测框在所述图像坐标系中的投影位置信息。
  23. 根据权利要求22所述的装置,其中,所述第一处理模块,在投影位置信息包括下述至少一种:所述三维检测框的中心点在所述图像坐标系中投影点的第二坐标信息。
  24. 根据权利要求23所述的装置,其中,所述第一处理模块,在基于所述特征图中的每个特征点属于目标对象的中心点的概率、所述第二位置偏移量、以及下采样率,得到所述三维检测框在所述图像坐标系中的投影位置信息时,配置为:基于所述特征图中的每个特征点属于目标对象的中心点的概率,从所述特征图中,确定目标特征点;基于所述目标特征点在所述特征图中的位置信息、所述目标特征点对应的第二位置偏移量、以及所述下采样率,确定所述三维检测框的中心点在所述图像坐标 系中投影点的第二坐标信息。
  25. 根据权利要求16至24任一项所述的装置,其中,所述所述第二处理模块,在基于所述二维位置信息、所述投影位置信息、以及所述二维检测框和所述三维检测框之间的投影关系信息,得到所述目标对象在所述相机坐标系中的中间深度值时,配置为:基于所述二维位置信息、所述投影位置信息、所述目标对象的实际尺寸信息、所述目标对象的朝向信息、以及所述二维检测框和所述三维检测框之间的投影关系信息,得到所述目标对象在所述相机坐标系中的中间深度值。
  26. 根据权利要求25所述的装置,其中,所述第一处理模块,还配置为:基于所述待处理图像的特征图,对所述目标对象进行尺寸预测处理,得到所述目标对象的实际尺寸信息;和/或,基于所述待处理图像的特征图,对所述目标对象进行朝向预测处理,得到所述目标对象在所述相机坐标系中的朝向信息。
  27. 根据权利要求16至26任一项所述的装置,其中,所述二维检测框和三维检测框的投影关系信息,是基于所述三维检测框在图像坐标系中的投影的尺寸信息和位置信息、与所述二维检测框的尺寸信息和位置信息建立的。
  28. 根据权利要求16至27任一项所述的装置,其中,所述预测模块,在所述基于所述中间深度值,得到所述目标对象的中心点在所述相机坐标系中的目标深度值时,配置为:对所述目标对象的中心点在所述相机坐标系中的中间深度值构成的深度图像进行非线性变换,得到深度特征图;基于所述深度特征图、以及所述待处理图像的特征图,得到所述目标对象的中心点在所述相机坐标系中的目标深度值。
  29. 根据权利要求28所述的装置,其中,所述预测模块,在基于所述深度特征图、以及所述待处理图像的特征图,得到所述目标对象的中心点在所述相机坐标系中的目标深度值时,配置为:将所述深度特征图、以及所述待处理图像的特征图进行叠加,形成目标特征图;利用预先训练的深度值预测神经网络对所述目标特征图进行深度预测处理,得到所述特征图中各个特征点的目标深度值;基于所述特征图中各个特征点属于目标对象的中心点的概率、以及所述各个特征点分别对应的目标深度值,得到所述目标对象的中心点在所述相机坐标系中的目标深度值。
  30. 根据权利要求16至29任一项所述的装置,其中,还包括第三处理模块,配置为基于所述目标对象的中心点在所述相机坐标系中的目标深度值、以及所述目标对象的实际尺寸信息,得到所述目标对象在所述相机坐标系中的三维检测结果。
  31. 一种计算机设备,包括:处理器和存储器,所述存储器存储有所述处理器可执行的机器可读指令,所述处理器用于执行所述存储器中存储的机器可读指令,所述机器可读指令被所述处理器执行时,所述处理器执行如权利要求1至15任一项所述的深度检测方法的步骤。
  32. 一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被计算机设备运行时,所述计算机设备执行如权利要求1至15任一项所述的深度检测方法的步骤。
  33. 一种计算机程序,包括计算机可读代码,当所述计算机可读代码在电子设备中运行时,所述电子设备中的处理器执行用于实现权利要求1至15中任一项所述的深度检测方法。
  34. 一种计算机程序产品,所述计算机程序产品包括一条或多条指令,所述一条或多条指令适于由处理器加载并执行如权利要求1至15任一项所述深度检测方法中的步骤。
PCT/CN2021/125278 2021-06-25 2021-10-21 深度检测方法、装置、设备、存储介质、计算机程序及产品 WO2022267275A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110713298.0 2021-06-25
CN202110713298.0A CN113344998B (zh) 2021-06-25 2021-06-25 深度检测方法、装置、计算机设备及存储介质

Publications (1)

Publication Number Publication Date
WO2022267275A1 true WO2022267275A1 (zh) 2022-12-29

Family

ID=77478780

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/125278 WO2022267275A1 (zh) 2021-06-25 2021-10-21 深度检测方法、装置、设备、存储介质、计算机程序及产品

Country Status (2)

Country Link
CN (1) CN113344998B (zh)
WO (1) WO2022267275A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115880470A (zh) * 2023-03-08 2023-03-31 深圳佑驾创新科技有限公司 3d图像数据的生成方法、装置、设备及存储介质

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113344998B (zh) * 2021-06-25 2022-04-29 北京市商汤科技开发有限公司 深度检测方法、装置、计算机设备及存储介质
CN114842287B (zh) * 2022-03-25 2022-12-06 中国科学院自动化研究所 深度引导变形器的单目三维目标检测模型训练方法及装置
CN115546216B (zh) * 2022-12-02 2023-03-31 深圳海星智驾科技有限公司 一种托盘检测方法、装置、设备及存储介质
CN116189150B (zh) * 2023-03-02 2024-05-17 吉咖智能机器人有限公司 基于融合输出的单目3d目标检测方法、装置、设备和介质
CN116362318B (zh) * 2023-03-30 2024-02-06 复旦大学 基于自适应深度修正的纯视觉三维目标检测方法和系统
CN116386016B (zh) * 2023-05-22 2023-10-10 杭州睿影科技有限公司 一种异物处理方法、装置、电子设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009097714A1 (zh) * 2008-02-03 2009-08-13 Panovasic Technology Co., Ltd. 多视角视频图像深度搜索方法及深度估计方法
CN109035320A (zh) * 2018-08-12 2018-12-18 浙江农林大学 基于单目视觉的深度提取方法
CN111582207A (zh) * 2020-05-13 2020-08-25 北京市商汤科技开发有限公司 图像处理方法、装置、电子设备及存储介质
CN111857111A (zh) * 2019-04-09 2020-10-30 商汤集团有限公司 对象三维检测及智能驾驶控制方法、装置、介质及设备
CN113344998A (zh) * 2021-06-25 2021-09-03 北京市商汤科技开发有限公司 深度检测方法、装置、计算机设备及存储介质

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109035319B (zh) * 2018-07-27 2021-04-30 深圳市商汤科技有限公司 单目图像深度估计方法及装置、设备、程序及存储介质
CN111340864B (zh) * 2020-02-26 2023-12-12 浙江大华技术股份有限公司 基于单目估计的三维场景融合方法及装置
CN112419385B (zh) * 2021-01-25 2021-04-09 国汽智控(北京)科技有限公司 一种3d深度信息估计方法、装置及计算机设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009097714A1 (zh) * 2008-02-03 2009-08-13 Panovasic Technology Co., Ltd. 多视角视频图像深度搜索方法及深度估计方法
CN109035320A (zh) * 2018-08-12 2018-12-18 浙江农林大学 基于单目视觉的深度提取方法
CN111857111A (zh) * 2019-04-09 2020-10-30 商汤集团有限公司 对象三维检测及智能驾驶控制方法、装置、介质及设备
CN111582207A (zh) * 2020-05-13 2020-08-25 北京市商汤科技开发有限公司 图像处理方法、装置、电子设备及存储介质
CN113344998A (zh) * 2021-06-25 2021-09-03 北京市商汤科技开发有限公司 深度检测方法、装置、计算机设备及存储介质

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115880470A (zh) * 2023-03-08 2023-03-31 深圳佑驾创新科技有限公司 3d图像数据的生成方法、装置、设备及存储介质

Also Published As

Publication number Publication date
CN113344998B (zh) 2022-04-29
CN113344998A (zh) 2021-09-03

Similar Documents

Publication Publication Date Title
WO2022267275A1 (zh) 深度检测方法、装置、设备、存储介质、计算机程序及产品
EP3680808A1 (en) Augmented reality scene processing method and apparatus, and computer storage medium
WO2018119889A1 (zh) 三维场景定位方法和装置
TW202143100A (zh) 圖像處理方法、電子設備及電腦可讀儲存介質
CN110276829B (zh) 通过多尺度体素哈希处理的三维表示
US9454851B2 (en) Efficient approach to estimate disparity map
EP3786900A2 (en) Markerless multi-user multi-object augmented reality on mobile devices
JP6491517B2 (ja) 画像認識ar装置並びにその姿勢推定装置及び姿勢追跡装置
CN108230384B (zh) 图像深度计算方法、装置、存储介质和电子设备
CN106815869B (zh) 鱼眼相机的光心确定方法及装置
JPWO2018235163A1 (ja) キャリブレーション装置、キャリブレーション用チャート、チャートパターン生成装置、およびキャリブレーション方法
KR102386444B1 (ko) 이미지 심도 결정 방법 및 생체 식별 방법, 회로, 디바이스, 및 매체
CN111627001B (zh) 图像检测方法及装置
CN111279354A (zh) 图像处理方法、设备及计算机可读存储介质
CN112150518B (zh) 一种基于注意力机制的图像立体匹配方法及双目设备
US20230237683A1 (en) Model generation method and apparatus based on multi-view panoramic image
CN112560592A (zh) 图像处理方法及装置、终端控制方法及装置
US20200226392A1 (en) Computer vision-based thin object detection
US11017557B2 (en) Detection method and device thereof
WO2023005457A1 (zh) 位姿计算方法和装置、电子设备、可读存储介质
CN112802081A (zh) 一种深度检测方法、装置、电子设备及存储介质
US11189053B2 (en) Information processing apparatus, method of controlling information processing apparatus, and non-transitory computer-readable storage medium
CN114616586A (zh) 图像标注方法、装置、电子设备及计算机可读存储介质
JP2013242625A (ja) 画像処理装置、画像処理方法
JP6931267B2 (ja) 原画像を目標画像に基づいて変形した表示画像を生成するプログラム、装置及び方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21946757

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21946757

Country of ref document: EP

Kind code of ref document: A1