WO2022033076A1 - 目标检测方法、装置、设备、存储介质及程序产品 - Google Patents

目标检测方法、装置、设备、存储介质及程序产品 Download PDF

Info

Publication number
WO2022033076A1
WO2022033076A1 PCT/CN2021/090359 CN2021090359W WO2022033076A1 WO 2022033076 A1 WO2022033076 A1 WO 2022033076A1 CN 2021090359 W CN2021090359 W CN 2021090359W WO 2022033076 A1 WO2022033076 A1 WO 2022033076A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
information
dimensional
pixel
target
Prior art date
Application number
PCT/CN2021/090359
Other languages
English (en)
French (fr)
Inventor
马新柱
刘诗男
曾星宇
欧阳万里
Original Assignee
上海商汤智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海商汤智能科技有限公司 filed Critical 上海商汤智能科技有限公司
Priority to KR1020217042833A priority Critical patent/KR20220024193A/ko
Publication of WO2022033076A1 publication Critical patent/WO2022033076A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30248Vehicle exterior or interior
    • G06T2207/30252Vehicle exterior; Vicinity of vehicle

Definitions

  • the present disclosure is based on the Chinese patent application with the application number of 202010792241.X, the application date of August 8, 2020, and the application name of "a target detection method, device, equipment and storage medium", and requires the Chinese patent application Priority, the entire content of this Chinese patent application is hereby incorporated by reference into the present disclosure in its entirety.
  • the present disclosure relates to the technical field of computer vision, and in particular, to a target detection method, apparatus, device, storage medium and program product.
  • Target detection refers to the use of computer technology to detect and identify targets of interest in images or videos, such as common pedestrian detection, obstacle detection, etc.
  • Target detection technology has been widely used in various fields, such as robotics, autonomous driving, and behavior recognition.
  • the embodiments of the present disclosure provide at least one target detection solution.
  • an embodiment of the present disclosure provides a target detection method, including:
  • an embodiment of the present disclosure provides a target detection device, including:
  • an acquisition module configured to acquire an image acquired by the image acquisition component and internal parameters of the image acquisition component; a determination module configured to determine each pixel in the acquired image based on the acquired image and the internal parameters
  • the three-dimensional coordinate information of the point in the world coordinate system; the generating module is configured to generate and the collected image according to the three-dimensional coordinate information of each pixel point in the collected image in the world coordinate system.
  • the three-dimensional information image corresponding to the acquired image; the sorting of the pixels in the three-dimensional information image is the same as the sorting of the pixels in the acquired image; the detection module is configured to determine the three-dimensional information image based on the three-dimensional information image.
  • embodiments of the present disclosure provide an electronic device, including: a processor, a memory, and a bus, where the memory stores machine-readable instructions executable by the processor, and when the electronic device runs, the processing The processor and the memory communicate through a bus, and when the machine-readable instructions are executed by the processor, the steps of the target detection method according to the first aspect are performed.
  • an embodiment of the present disclosure provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and the computer program is executed by a processor to execute the target detection method described in the first aspect. step.
  • an embodiment of the present disclosure provides a computer program product, including computer-readable code, and when the computer-readable code is executed in an electronic device, a processor in the electronic device executes the first The steps of the object detection method described in the aspect.
  • a three-dimensional information image with the same image structure and the three-dimensional coordinate information of each pixel in the world coordinate system can be obtained based on the acquired image, based on the three-dimensional information image.
  • the information image can complete the three-dimensional target detection for the target object.
  • the image acquisition component has the advantages of high portability and low cost.
  • the complete target object in the field of view can be obtained, including the target object with a small volume, so the three-dimensional target detection for the target object in the short-range area can be accurately completed.
  • FIG. 1A shows a schematic diagram of the detection result of a target object in three-dimensional space
  • FIG. 1B shows a schematic diagram of the detection result of the target object on the two-dimensional image
  • FIG. 1C shows a flowchart of a target detection method provided by an embodiment of the present disclosure
  • FIG. 2 shows a flowchart of a method for determining three-dimensional coordinate information of a pixel in a world coordinate system provided by an embodiment of the present disclosure
  • FIG. 3 shows a schematic diagram of a scene for determining three-dimensional coordinate information of a pixel in a world coordinate system provided by an embodiment of the present disclosure
  • FIG. 4 shows a flowchart of a first method for generating a three-dimensional information image provided by an embodiment of the present disclosure
  • FIG. 5 shows a flowchart of a second method for generating a three-dimensional information image provided by an embodiment of the present disclosure
  • FIG. 6 shows a flowchart of a method for determining three-dimensional detection information of a target object provided by an embodiment of the present disclosure
  • FIG. 7 shows a flowchart of a method for determining three-dimensional detection information of a target object provided by an embodiment of the present disclosure
  • FIG. 8 shows a schematic diagram of a neural network for determining three-dimensional detection information of a target object provided by an embodiment of the present disclosure
  • FIG. 9A shows a schematic diagram of a training method of a neural network provided by an embodiment of the present disclosure
  • FIG. 9B shows a schematic diagram of a training method of a neural network provided by an embodiment of the present disclosure.
  • FIG. 10 shows a flowchart of a control method for a target vehicle provided by an embodiment of the present disclosure
  • FIG. 11A shows a logic flow diagram of a target detection method provided by an embodiment of the present disclosure
  • FIG. 11B shows a schematic diagram of an image to be detected provided by an embodiment of the present disclosure
  • FIG. 11C shows a schematic diagram of a depth image provided by an embodiment of the present disclosure.
  • 11D shows a schematic diagram of a three-dimensional information image provided by an embodiment of the present disclosure
  • FIG. 12 shows a schematic structural diagram of a target detection apparatus provided by an embodiment of the present disclosure
  • FIG. 13 shows a schematic diagram of an electronic device provided by an embodiment of the present disclosure.
  • Object detection refers to the use of computer technology to detect and identify objects of interest in images or videos, such as common pedestrian detection and obstacle detection.
  • two-dimensional target detection and three-dimensional target detection are included: the two-dimensional target detection result can mark the two-dimensional detection frame of the target object contained in the image, and the three-dimensional target detection result can mark the three-dimensional detection frame of the target object contained in the image.
  • three-dimensional target detection is more complex and more significant.
  • 3D object detection is an important task. This task needs to detect the coordinates, shape and orientation of the target in three-dimensional space. Due to the lack of depth information in image data, image-based 3D detection systems generally need to perform depth estimation on the target image to obtain the depth information of each pixel in the image, and then use the RGB image and the estimated depth map as the input of the system to calculate 3D information of the object in the image. As shown in FIGS. 1A and 1B , the detection results of the target object (car) in the three-dimensional space and the detection results on the two-dimensional image are respectively shown. The rectangular frame 11 is the detection result, and the rectangular frame 12 is the manual labeling result.
  • the image-based three-dimensional detection methods in the related art mainly have the following shortcomings: on the one hand, the image data lacks corresponding depth information, and the three-dimensional information (position, shape, orientation) of the target cannot be effectively estimated; They belong to different coordinate systems, and directly using image data to calculate the results in three-dimensional space will produce large errors, resulting in serious performance degradation; on the other hand, using camera parameters can map depth data to three-dimensional space, but this method will Pixmaps are 3D point clouds. It will lead to additional problems: for example, the whole system will contain different forms of data (image data and point cloud data), so that the system must contain different modules to process these two kinds of data separately, which cannot be processed uniformly.
  • a radar device When 3D target detection is performed on the target object based on the method of collecting point cloud images by the radar device, it is necessary to install a radar device for the object to be detected.
  • a radar device is installed for a robot that performs 3D target detection.
  • the method is more expensive and less portable.
  • the radar device collects point cloud images for 3D target detection, because the radar device has a radar blind spot and the problem of low resolution, such a close-range radar blind spot, or a small target object, may Unable to generate valid point cloud data containing target correspondence. Therefore, when the radar device collects point cloud images for target detection, there are problems such as high cost, poor portability, and low accuracy when detecting objects in close-range areas or small volumes.
  • an embodiment of the present disclosure provides a target detection method. After the image collected by the image collection component is obtained, each pixel in the collected image can be determined by using the collected image and the internal parameters of the image collection component. The three-dimensional coordinate information of the point in the world coordinate system, and then according to the collected image and the three-dimensional coordinate information of each pixel in the collected image under the world coordinate system, the pixel point sorting and the pixel points in the collected image are obtained. Sort consistent 3D informative images. Because the order of pixel points remains unchanged, the 3D information image can still retain the same image structure as the captured image. Based on this, the 3D detection information of the target object contained in the captured image in the world coordinate system can be effectively determined.
  • the embodiment of the present disclosure performs target detection, after the image collected by the image collection component, the same image structure can be obtained based on the collected image, and the 3D coordinate information of each pixel in the world coordinate system is increased. 3D infographics. Based on the three-dimensional information image, three-dimensional target detection for the target object can be completed.
  • the image acquisition component has the advantages of high portability and low cost; and compared with the point cloud data collected by the radar device, the image acquisition component can also obtain the complete target object within the field of view in the short-range area, including The target object is small in volume, so it can accurately complete the three-dimensional target detection for the target object in the close range.
  • the execution body of the target detection method provided by the embodiments of the present disclosure is generally a computer device with a certain computing capability, and the computer device includes, for example, a terminal device or a server or other processing device.
  • the object detection method may be implemented by the processor calling computer-readable instructions stored in the memory.
  • the target detection method includes the following steps S101 to S104, wherein:
  • Step S101 acquiring an image captured by an image capturing component and internal parameters of the image capturing component.
  • the image acquisition component may include a visible light (red: Red; green: Green; blue: Blue, RGB) camera or a camera component such as an RGB camera that can acquire RGB images, and the corresponding acquired images may be RGB images.
  • a visible light red: Red; green: Green; blue: Blue, RGB
  • RGB visible light
  • a camera component such as an RGB camera that can acquire RGB images
  • the corresponding acquired images may be RGB images.
  • the internal parameters of the image acquisition component may include some or all of the parameters in the camera internal parameter matrix for converting the image coordinate system to the camera coordinate system, which is not limited in this embodiment of the present disclosure.
  • Step S102 based on the collected image and internal parameters, determine the three-dimensional coordinate information of each pixel in the collected image in the world coordinate system.
  • an image coordinate system can be established based on the collected image, and the pixel coordinate value of each pixel in the image coordinate system can be determined based on the constructed image coordinate system, based on the conversion relationship between the image coordinate system and the camera coordinate system.
  • the coordinate values of each pixel included in the acquired image along the X-axis and the Y-axis in the camera coordinate system can be determined.
  • the coordinates of each pixel included in the acquired image along the X axis and the Y axis under the world coordinate system can be determined. value.
  • the coordinate value of each pixel in the camera coordinate system can be directly used as the coordinate value of the pixel in the world coordinate system.
  • the coordinate value of each pixel point along the Z-axis direction in the world coordinate system can be determined according to the depth information of the pixel point in the camera coordinate system.
  • the depth image corresponding to the collected image can be determined according to the collected image and the pre-trained neural network for determining the depth image, so that each pixel in the collected image is at Depth information in the camera coordinate system. In this way, combining the pixel coordinate value of each pixel in the image coordinate system and the depth information of the pixel in the camera coordinate system, the three-dimensional coordinate information of the pixel in the world coordinate system can be determined, and the implementation process will be carried out later. elaborate.
  • Step S103 according to the collected image and the three-dimensional coordinate information of each pixel in the collected image in the world coordinate system, generate a three-dimensional information image corresponding to the collected image;
  • the order of the pixels in the three-dimensional information image is the same as the order of the pixels in the collected image.
  • a plurality of pixels included in the collected image may form an image structure according to set information such as texture, tone, and order.
  • the image structure can reflect the structure information corresponding to the target object to be detected contained in the collected image.
  • the image structure of the collected image will not change. , that is, the shape of the target object contained in the image does not change. Therefore, when the order of the pixels in the 3D information image is the same as the order of the pixels in the collected image, the 3D information image can still retain the same image structure as the collected image. Based on this, the collected image can be effectively determined.
  • the collected image when generating a three-dimensional information image corresponding to the collected image according to the collected image and the three-dimensional coordinate information of each pixel in the collected image in the world coordinate system, it may include:
  • the three-dimensional information image is generated according to the three-dimensional coordinate information corresponding to each pixel point and the index information of the pixel point in the collected image; wherein, the channel information of each pixel point in the three-dimensional information image at least includes the pixel point in 3D coordinate information in the world coordinate system.
  • the index information of each pixel in the collected image may represent the position of the pixel in the collected image, for example, the collected image contains m*n pixels, which can be obtained through the index information (i , j) to represent the index information of the pixel in the collected image, and (i, j) can represent that the pixel is located at row i and column j in the collected image.
  • the 3D coordinate information corresponding to each pixel can be combined with the location of the pixel in The index information in the collected images is used to reconstruct a three-dimensional information image in the form of an image.
  • the constructed three-dimensional information image and the collected image have the same image structure, that is, The shape of the contained target object remains unchanged.
  • three-dimensional object detection can be performed on the target object included in the three-dimensional information image.
  • the three-dimensional information image when generating the three-dimensional information image corresponding to the collected image, it is generated according to the index information of each pixel in the collected image, so the three-dimensional information image can still be retained and the collected image. same image structure.
  • the three-dimensional information image also adds the three-dimensional coordinate information of the pixel in the world coordinate system for each pixel point, so it is possible to detect the target object in the world coordinate system based on the three-dimensional information image. 3D inspection information.
  • Step S104 based on the three-dimensional information image, determine three-dimensional detection information of the target object contained in the collected image in the world coordinate system.
  • the target objects include different shapes in different application scenarios.
  • the target objects may include vehicles, pedestrians, and railings waiting for three-dimensional target detection.
  • three-dimensional object detection can be performed on the target object based on the three-dimensional information image. Because the three-dimensional information image contains the same image structure as the acquired image, the three-dimensional detection information of the target object contained in the acquired image in the world coordinate system can be detected through the three-dimensional information image.
  • the three-dimensional detection information of each target object in the world coordinate system may include the position coordinates of the center point of the target object in the world coordinate system, and the length, width and height of the target object in the world coordinate system, And the orientation angle of the target object in the world coordinate system.
  • the orientation angle can be represented by the angle between the preset positive direction of the target object and the preset direction.
  • the angle between the front of the vehicle and the preset direction can be used to represent the orientation of the vehicle. angle.
  • the three-dimensional detection information of the target object may be represented by the position information of the three-dimensional (three-dimensional, 3D) detection frame corresponding to the target object.
  • the length, width and height of the target object in the world coordinate system can be respectively represented by the length, width and height of the 3D detection frame corresponding to the target object
  • the center point of the target object can be represented by the length, width and height of the 3D detection frame corresponding to the target object.
  • the center point of the 3D detection frame is represented, and the orientation angle of the target object can be represented by the orientation angle of the 3D detection frame corresponding to the target object.
  • the 3D detection frame corresponding to the target object can be represented by the circumscribed cuboid of the target object.
  • a three-dimensional information image with the same image structure and the three-dimensional coordinate information of each pixel in the world coordinate system can be obtained based on the acquired image, based on the three-dimensional information image.
  • the information image can complete the three-dimensional object detection for the target object.
  • the image acquisition component has the advantages of high portability and low cost, and compared with the point cloud data collected by the radar device, the image acquisition component can also obtain the complete target object within the field of view in the short-range area. Including small-volume target objects, it can accurately complete the three-dimensional target detection for target objects in the close range.
  • step S102 when determining the three-dimensional coordinate information of each pixel in the collected image in the world coordinate system based on the collected image and internal parameters, as shown in FIG. 2, the following steps S1021 to S1022 may be included:
  • Step S1021 based on the collected image, generate a depth image corresponding to the collected image, where the depth image includes depth information of each pixel in the collected image;
  • Step S1022 Determine the three-dimensional coordinate information of the pixel in the world coordinate system based on the two-dimensional coordinate information of each pixel in the collected image in the image coordinate system, the depth information and internal parameters of the pixel.
  • the depth image corresponding to the collected image can be determined according to a pre-trained neural network for determining the depth image, so as to obtain each depth image in the collected image.
  • the depth information of each pixel can be, for example, the depth information in the camera coordinate system.
  • the neural network used to determine the depth image corresponding to the collected image can be obtained by training a large number of pre-collected sample images and the depth information of the set pixels marked for the sample image in the camera coordinate system.
  • the embodiment does not limit the training process of the neural network for determining the depth image.
  • the three-dimensional coordinate information of each pixel in the camera coordinate system can be determined first, and then the three-dimensional coordinate information of the pixel in the world coordinate system can be determined.
  • the three-dimensional coordinate information of each pixel in the world coordinate system may include the coordinate value along the X-axis direction, the coordinate value along the Y-axis direction, and the coordinate value along the Z-axis direction under the world coordinate system.
  • the embodiment of the present disclosure can make the camera coordinate system coincide with the world coordinate system, that is, make the coordinate origin of the camera coordinate system and the coordinate origin of the world coordinate system overlap, so that the X-axis of the camera coordinate system and the X-axis of the world coordinate system Coincidence makes the Y axis of the camera coordinate system coincide with the Y axis of the world coordinate system, and makes the Z axis of the camera coordinate system coincide with the Z axis of the world coordinate system.
  • the pixel point P is the pixel point of the i-th row and the j-th column in the collected image.
  • the three-dimensional coordinate information of the pixel point P in the world coordinate system can be determined according to the following formula (1):
  • Z (i, j) represents the coordinate value of the pixel point P of the collected image along the Z-axis direction in the world coordinate system
  • X (i, j) represents the pixel point P of the collected image in the world coordinate system.
  • Y (i, j) represents the coordinate value of the pixel point P of the acquired image along the Y-axis direction in the world coordinate system
  • u (i, j) represents the pixel point of the acquired image
  • v (i, j) represents the coordinate value of the pixel point P of the collected image along the v-axis direction in the pixel coordinate system
  • d (i, j) represents the acquisition The depth value of the pixel point P of the obtained image
  • (Cx, Cy) represents the coordinate value of the light point C of the image acquisition component in the world coordinate system, where Cx represents the intersection of the optical axis of the image
  • the camera parameter information used includes the coordinate value of the intersection of the optical axis of the image acquisition component and the acquired image along the X axis in the world coordinate system, and the image acquisition component.
  • the optical center of the image acquisition component set on the target vehicle can be directly used as the origin, so that the world coordinate system and the camera coordinate system corresponding to the image acquisition component are coincident, so that the above formula can be directly used to determine each The three-dimensional coordinate information of a pixel in the world coordinate system.
  • the depth information corresponding to each pixel of the collected image can be quickly predicted based on the collected image, and further based on the two-dimensional coordinate information of each pixel in the image coordinate system, the corresponding depth information, combined with the internal parameters of the image acquisition component, to quickly obtain the three-dimensional coordinate information of each pixel in the acquired image in the world coordinate system.
  • a three-dimensional information image corresponding to the collected image can be generated based on the three-dimensional coordinate information of each pixel in the world coordinate system.
  • step S103 when generating a three-dimensional information image according to the three-dimensional coordinate information corresponding to each pixel point and the index information of the pixel point in the collected image, as shown in FIG. 4 , the following steps S1031 to S1032 may be included:
  • Step S1031 taking the three-dimensional coordinate information corresponding to each pixel as the multi-channel information corresponding to the pixel in the three-dimensional information image;
  • Step S1032 Generate a three-dimensional information image based on the multi-channel information corresponding to the pixel in the three-dimensional information image and the index information of the pixel in the collected image.
  • each pixel in the RGB image contains three-channel information in the RGB image, that is, the channel value on the R channel, the channel value on the G channel, and the channel on the B channel. value.
  • the channel value of each pixel on the R channel, the channel value on the G channel, and the channel value on the B channel can represent the color information of the pixel in the RGB image.
  • a three-dimensional information image is also composed of multiple pixels.
  • the corresponding pixel point can be sequentially corresponding to the pixel point according to the index information of the pixel point in the collected image.
  • the multi-channel information of each pixel in the three-dimensional information image in the three-dimensional information image includes the coordinate value of the pixel along the X-axis channel under the world coordinate system, and the coordinate value along the Y-axis channel under the world coordinate system.
  • the three-dimensional information image contains the same number of pixels, and the sorting method of the pixels is unchanged. Therefore, the three-dimensional information image has the same image structure compared with the corresponding collected image. Therefore, the structure information of the target object contained in the collected image can be identified, so that it is convenient to perform three-dimensional object detection on the target object contained in the collected image based on the three-dimensional information image.
  • step S103 when generating a three-dimensional information image according to the three-dimensional coordinate information corresponding to each pixel point and the index information of the pixel point in the collected image, as shown in FIG. 5 , may include the following steps S1033 to S1034:
  • Step S1033 taking the three-dimensional coordinate information corresponding to each pixel point and the information of the pixel point in the collected image as the multi-channel information corresponding to the pixel point in the three-dimensional information image;
  • Step S1034 Generate a three-dimensional information image based on the multi-channel information corresponding to the pixel in the three-dimensional information image and the index information of the pixel in the collected image.
  • the pixel point can be defined as the pixel point according to the index information of the pixel point in the collected image.
  • Add three-channel information composed of three-dimensional coordinate information and generate a three-dimensional information image corresponding to the collected image.
  • each pixel of the three-dimensional information image obtained in this way can contain six channels.
  • the information includes the channel value on the R channel, the channel value on the G channel, the channel value on the B channel, the coordinate value on the X-axis channel under the world coordinate system, and the coordinate value along the Y-axis channel under the world coordinate system. and the coordinate values along the Z-axis channel in the world coordinate system.
  • the 3D information image generated in this way contains the same number of pixels and the same way of sorting the pixels. Therefore, compared with the corresponding collected image, the 3D information image has the same number of pixels as the collected image. image consistent with the image structure.
  • the three-dimensional information image also retains the information of the collected image, such as the color information of the collected image, so as to facilitate accurate identification of the target object contained in the collected image based on the three-dimensional information image. 3D object detection.
  • step S104 when determining the three-dimensional detection information of the target object contained in the collected image in the world coordinate system based on the three-dimensional information image, as shown in FIG. 6, the following steps S1041 to S1044 may be included:
  • Step S1041 crop the three-dimensional information image based on the two-dimensional detection information of the target object contained in the collected image to obtain at least one three-dimensional information image block, and each three-dimensional information image block contains at least one target object.
  • a pre-trained neural network for 2D target detection can be used to perform target detection on the collected image, so as to obtain the 2D detection information of the target object included in the collected image.
  • the two-dimensional detection information of the target object may be the location area of the two-dimensional detection frame of the target object in the collected image.
  • the three-dimensional information sample image block of the same size as the two-dimensional detection frame can be obtained by trimming the three-dimensional information image, so that the area that does not contain the target object can be filtered out. Therefore, the target detection can be directly performed on the three-dimensional information image block in the later stage, which can narrow the detection range and improve the detection efficiency.
  • Step S1042 Perform feature extraction on each three-dimensional information image block to obtain multiple feature images corresponding to the three-dimensional information image block, and the multiple feature images include depth feature images representing depth information of the target object.
  • multiple feature images corresponding to each three-dimensional information image block can be extracted based on the feature extraction network in the pre-trained neural network.
  • the size of the three-dimensional information image blocks with different sizes can be adjusted so that the sizes of the three-dimensional information image blocks input to the feature extraction network are consistent.
  • the feature extraction network can contain multiple convolution kernels, and each convolution kernel is used to extract a feature image corresponding to the three-dimensional information image block.
  • the plurality of feature images may include a depth feature image used to characterize the depth information of the target object, a feature image used to characterize the length information of the target object, a feature image used to characterize the width information of the target object, and a feature image used to characterize the center point position of the target object. Feature image of information.
  • Step S1043 Classify at least one 3D information image block based on the depth feature image corresponding to each 3D information image block, and determine a 3D object detection network corresponding to each type of 3D information image block.
  • the depth information of the target object contained in each 3D information image block in the world coordinate system may be different, and multiple 3D information image blocks can be classified based on the depth feature image corresponding to each 3D information image block in advance. According to the depth information corresponding to the target object, a plurality of three-dimensional information image blocks are classified, and a three-dimensional target detection network corresponding to each type of three-dimensional information image is determined.
  • the pre-trained neural network may include multiple 3D object detection networks, and each 3D object detection network can predict the 3D detection information of the target object contained in a class of 3D information image blocks, such as in the pre-trained neural network.
  • Contains three target detection networks the first target detection network is used to detect 3D information image blocks with depth information greater than 0 and less than or equal to L1, and the second target detection network is used to detect depth information greater than L1 and less than or equal to L2.
  • 3D information image patches the third object detection network is used to detect 3D information image patches with depth information greater than L3.
  • each 3D object detection network can be enabled to detect 3D information image blocks with the same depth range.
  • the differences in the 3D detection information corresponding to the target objects in the 3D information image blocks with the same depth range are small, so that the 3D object detection network can improve the detection accuracy during 3D object detection; on the other hand, when the 3D information image
  • the 3D target detection can be performed simultaneously through a plurality of 3D target detection networks, so that the detection speed can be improved.
  • the 3D object detection network corresponding to the 3D information image block can be determined.
  • Step S1044 for each three-dimensional information image block, according to the three-dimensional object detection network corresponding to the three-dimensional information image block and a plurality of characteristic images corresponding to the three-dimensional information image block, determine that the target object in the three-dimensional information image block is in the world coordinate system. 3D inspection information under.
  • the 3D target detection network When performing 3D target detection on the corresponding 3D information image block based on the 3D target detection network, it is necessary to consider multiple feature images corresponding to the 3D information image block, such as the above-mentioned depth feature image used to represent the depth information of the target object, A feature image used to characterize the length information of the target object, a feature image used to characterize the width information of the target object, and a feature image used to characterize the center point position information of the target object, etc. Each 3D target detection network can be based on these feature images.
  • the three-dimensional detection information of the target object contained in the corresponding three-dimensional information image block is predicted.
  • the three-dimensional information image can be cropped based on the two-dimensional detection information corresponding to the target object contained in the collected image to obtain a plurality of three-dimensional information image blocks.
  • This method can filter out detections that do not contain the target object.
  • 3D target detection is performed for 3D information image blocks
  • multiple 3D target detection networks can be pre-built for simultaneous detection, which can improve detection accuracy and speed.
  • step S1044 for each three-dimensional information image block, according to the three-dimensional object detection network corresponding to the three-dimensional information image block and a plurality of characteristic images corresponding to the three-dimensional information image block, it is determined that the target object in the three-dimensional information image block is in the
  • the three-dimensional detection information in the world coordinate system may include the following steps S10441 to S10443:
  • Step S10441 for each three-dimensional information image block, according to the set pooling size and pooling step size, perform maximum pooling processing on each feature image corresponding to the three-dimensional information image block, and obtain the feature image after pooling processing.
  • the corresponding pooling value for each three-dimensional information image block, according to the set pooling size and pooling step size, perform maximum pooling processing on each feature image corresponding to the three-dimensional information image block, and obtain the feature image after pooling processing. The corresponding pooling value.
  • each feature image contains an attribute feature of the target object contained in the 3D information image block, for example, it may contain the texture attribute feature, color attribute feature, depth attribute feature, and length attribute feature of the target object contained in the 3D information image block. , width attribute feature, center point position attribute feature, etc.
  • a maximum pooling process may be performed to obtain a pooling value corresponding to the feature image after the pooling process. For example, take one of the feature images as an example.
  • the feature image contains 4*4 feature values. According to the pooling size of 2*2 and the step size of 2, the maximum pooling process is performed, and 2*2 pooling values can be obtained. If Maximum pooling is performed according to the pooling size of the same size as the feature image, and 1*1 pooling values can be obtained.
  • a binary mask image corresponding to the 3D information image block may be determined, and the binary mask image is in the region representing the target object The value of is 1, and the value of the area representing the non-target object is 0.
  • each feature corresponding to the three-dimensional information image block can be first based on the binary mask image. The image is filtered, and the feature values representing the target object in each feature image are filtered out, and the feature values of non-target objects are changed to 0.
  • the speed of the pooling process can be improved;
  • the interference feature value of the region can be obtained, so that a more accurate pooling value can be obtained, so as to improve the accuracy of 3D target detection in the later stage.
  • Step S10442 the pooled values corresponding to each feature image of the three-dimensional information image block form a target detection feature vector corresponding to the three-dimensional information image block.
  • a target detection feature vector corresponding to the 3D information image block can be formed based on the pooled values of multiple feature images corresponding to the 3D information image block, and the 3D information image is represented by the target detection feature vector
  • the comprehensive feature information of the target object contained in the block, the comprehensive feature information may include the above-mentioned texture attribute features, color attribute features, depth attribute features, length attribute features, width attribute features, and center point position attribute features, etc. .
  • each three-dimensional information image block contains 10 feature images, and each feature image corresponds to 1*1 pooling value, then the target detection feature vector corresponding to the three-dimensional information image block contains 10 feature values; if Each feature image corresponds to 2*2 pooling values, and the target detection feature vector corresponding to the three-dimensional information image block contains 10*4 feature values.
  • Step S10443 Based on the target detection feature vector corresponding to the 3D information image block and the 3D target detection network corresponding to the 3D information image block, determine the 3D detection information of the target object in the 3D information image block in the world coordinate system.
  • the target detection feature vector corresponding to the three-dimensional information image block is input into the three-dimensional target detection network corresponding to the three-dimensional information image block, and the three-dimensional detection information of the target object contained in the three-dimensional information image block in the world coordinate system can be determined.
  • a plurality of feature images 83 corresponding to the three-dimensional information image blocks 81 can be obtained.
  • the pooling value 85 based on which the target detection feature vector corresponding to the three-dimensional information image block 81 is generated.
  • the type prediction process can be performed on the pooling value 85, and the 3D object detection network 87 corresponding to each 3D information image block can be determined based on the pooling value representing the depth information of the target object, and the corresponding 3D information image block can be further
  • the target detection feature vector is input to the corresponding 3D target detection network to complete the 3D target detection.
  • the 3D detection information mentioned many times above is detected by a pre-trained neural network, and the neural network is obtained by training the sample images containing the labeled 3D detection information of the target sample object.
  • a large number of sample images can be collected in advance, and each sample image can be labeled with the target sample object, and the labeled 3D detection information corresponding to the target sample object contained in each sample image can be determined.
  • the labeled 3D detection information can be based on a preset.
  • the three-dimensional coordinate information of the good target sample object in the world coordinate system is determined.
  • the neural network is obtained by training the following steps, including steps S901 to S905:
  • Step S901 acquiring the sample image collected by the image collecting component and the internal parameters of the image collecting component.
  • This process is similar to the above-mentioned process of acquiring the acquired image and the internal parameters of the image acquisition component.
  • the process description of the above-mentioned internal parameters of the image acquisition component please refer to the process description of the above-mentioned internal parameters of the image acquisition component.
  • Step S902 based on the collected sample image and internal parameters, determine the three-dimensional coordinate information of each sample pixel in the collected sample image in the world coordinate system.
  • This process is similar to the above-mentioned way of determining the three-dimensional coordinate information of each pixel in each acquired image in the world coordinate system.
  • the technical details not disclosed in this process please refer to the above-mentioned determination of each acquired image.
  • the process description of the three-dimensional coordinate information of each pixel in the world coordinate system is understood.
  • Step S903 according to the collected sample image and the three-dimensional coordinate information of each sample pixel in the collected sample image in the world coordinate system, generate a three-dimensional information sample image corresponding to the collected sample image;
  • the ordering of the sample pixels of is the same as the ordering of the sample pixels in the collected sample image.
  • This process is similar to the above-mentioned method of generating a three-dimensional information image.
  • technical details not disclosed in this process please refer to the above-mentioned description of the process of generating a three-dimensional information image.
  • Step S904 based on the three-dimensional information sample image and the neural network to be trained, predict the three-dimensional detection information of the target sample object contained in the sample image in the world coordinate system.
  • the neural network to be trained includes a variety of three-dimensional object detection networks. For the above step S904, based on the three-dimensional information sample image and the neural network to be trained, it is predicted that the target sample object contained in the sample image is in the world
  • the three-dimensional detection information in the coordinate system may include the following steps S9041 to S9043:
  • Step S9041 based on the two-dimensional detection information of the target sample object included in the sample image, the three-dimensional information sample image is trimmed to obtain at least one three-dimensional information sample image block, and each three-dimensional information image block includes at least one target object;
  • Step S9042 performing feature extraction on at least one three-dimensional information sample image block to obtain multiple feature sample images corresponding to each three-dimensional information sample image block, and the multiple feature sample images include depth feature sample images representing depth information of the target sample object;
  • Step S9043 classifying at least one three-dimensional information sample image block based on the depth feature sample images corresponding to the at least one three-dimensional information sample image respectively, and determining a three-dimensional target detection network corresponding to each three-dimensional information sample image block;
  • Step S9044 for each three-dimensional information sample image block, predict the three-dimensional information sample image according to the three-dimensional target detection network corresponding to the three-dimensional information sample image block in the neural network and a plurality of characteristic sample images corresponding to the three-dimensional information sample image block.
  • This process is similar to the above-mentioned way of predicting the 3D detection information of the target object in each 3D information image block in the world coordinate system.
  • various 3D target detection networks can be obtained, and 3D target detection can be performed on 3D information image blocks with different depth information, thereby improving the detection accuracy and speed in the application process.
  • Step S905 based on the predicted 3D detection information and the labeled 3D detection information, adjust the network parameter values in the neural network to be trained to obtain a neural network for determining the 3D detection information.
  • the 3D detection information of the target sample object contained in each sample image can be predicted and obtained, and the loss value corresponding to the loss function of the neural network to be trained can be obtained based on the predicted 3D detection information and the actual labeled 3D detection information. , and then adjust the network parameter value based on the loss value to obtain a neural network for determining the three-dimensional detection information.
  • the loss values corresponding to the predicted 3D detection information and the actual labeled 3D detection information may include a loss value for the size of the target sample object, a loss value for the center point of the target sample object, and a loss value for the target sample object
  • the loss value of the orientation angle, etc. through multiple trainings to make the loss value less than the set loss threshold, or after the number of training times reaches the set number of training times, the adjustment of the network parameter values can be completed, and the trained neural network can be obtained.
  • the target detection method provided by the embodiment of the present disclosure may be applied to the field of automatic driving, wherein the image acquisition component may be located on the target vehicle.
  • the target detection method provided by the embodiment of the present disclosure further includes the following steps S1001 to S1002:
  • Step S1001 based on the three-dimensional detection information of each target object, determine the distance information between the target object and the target vehicle;
  • Step S1002 control the target vehicle to travel based on the three-dimensional detection information, distance information of each target object, and current pose data of the target vehicle.
  • each target object Based on the three-dimensional detection information corresponding to each target object, it can include the size, orientation angle, and center point position coordinates of the target object in the world coordinate system, based on which the pose data of the target object in the world coordinate system can be represented. In addition, the distance information between the target object and the target vehicle can be obtained based on the position coordinates of the center point of each target object.
  • the target vehicle Based on the three-dimensional detection information of each target object, the distance information to the target vehicle, and the current pose data of the target vehicle, the target vehicle can be controlled to avoid the target object as an obstacle.
  • a world coordinate system can be established with the optical center of the image acquisition component as the origin, so that the distance between the center point of the target object and the origin in the world coordinate system can represent the distance between the target object and the origin in the world coordinate system. Distance information between target vehicles.
  • the distance between the target object and the target vehicle may first be used to determine whether the target vehicle has entered the dangerous area corresponding to the target object, for example, when the distance is less than the preset safety distance, It can be determined that the target vehicle has entered the dangerous area corresponding to the target object, and further based on the three-dimensional pose data corresponding to the target object and the current pose data of the target vehicle, it can be determined whether a collision will occur when driving according to the current driving route. When it is determined that a collision will not occur, the vehicle can continue driving according to the original route, and when it is determined that a collision will occur, the driving route can be adjusted, or the vehicle can slow down to avoid obstacles.
  • the distance information between each target object and the target vehicle can be obtained based on this, taking into account the three-dimensional detection of each target object
  • the information can represent the pose data of the target object in the world coordinate system. Therefore, the driving of the target vehicle is controlled based on the three-dimensional detection information of the target object, the distance information from the target vehicle, and the current pose data of the target vehicle, which can improve the driving safety of the target vehicle.
  • the embodiment of the present disclosure provides an image data coordinate system conversion method of an image-based three-dimensional detection system, which can maintain the image structure while converting the coordinate system, and further improve the accuracy of the detection system.
  • the depth image of the image to be detected is calculated first, and then the internal parameters of the camera that captures the image are obtained; then the three-dimensional spatial position of each pixel is calculated by using the depth image and the internal parameters of the camera, and is organized into image data Finally, the three-dimensional information of the target is obtained by using image-oriented deep learning technology.
  • FIG. 11A is a logical flowchart of a target detection method provided by an embodiment of the present disclosure. As shown in FIG. 11A , taking the image acquisition component as a camera as an example, the method includes at least the following steps:
  • Step S1101 acquiring an image to be detected captured by a camera
  • the image to be detected is a two-dimensional image of the target object, lacks corresponding depth information, and cannot effectively estimate the three-dimensional information (position, shape, orientation) of the target object.
  • Step S1102 acquiring the depth image of the image to be detected
  • the depth image of the image to be detected is shown in FIG. 11C , and the depth value of the part of the target object (car) is different from the depth value of the other parts.
  • the missing depth information in image data can be compensated by image depth estimation methods. Using depth estimation to obtain the depth image of the image to be detected can effectively supplement the missing depth information in the two-dimensional image.
  • depth estimation algorithms in the related art can generally meet this requirement, that is, to obtain camera parameters when the images to be detected are captured, and the embodiments of the present disclosure do not limit which depth estimation algorithms are sampled.
  • Step S1103 acquiring camera parameters when the to-be-detected image is captured
  • the camera parameters are internal parameters of the camera, which may include focal length and principal point.
  • Step S1104 determining the three-dimensional coordinate information of each pixel in the image to be detected
  • the index value (i, j) can indicate that the pixel is located in row i and column j in the image to be detected;
  • the depth value d of the index value; and the camera internal parameters obtained in the previous step use formula (1) to calculate the coordinates of the pixel in the three-dimensional space, so as to obtain the three-dimensional coordinate information of all pixels in the image to be detected.
  • Step S1105 generating a three-dimensional information image based on the three-dimensional coordinate information of each pixel;
  • the three-dimensional coordinate information of each pixel in the image to be detected is organized into an image form through the three-dimensional information image, as shown in FIG. 11D .
  • the calculated three-dimensional coordinates can be regarded as different channels and put back into the image, for example, to replace the original RGB channel.
  • the pixel information after coordinate transformation is organized in the form of an image, thereby avoiding the introduction of point cloud data, so that there is only one data representation form of the image in the entire system, and the system is kept simple and efficient.
  • step S1106 the neural network is used to detect the three-dimensional information image to obtain the detection result of the target object.
  • 3D object detection is performed using deep learning techniques oriented to image data, such as estimating the pose of 3D objects. It is only necessary to use the image-oriented deep learning technology to estimate the three-dimensional information of the target, and the example of the present disclosure does not limit which kind of neural network is used.
  • the embodiments of the present disclosure utilize the depth estimation method to obtain the depth image of the image to be detected, which can effectively supplement the missing depth information in the two-dimensional image.
  • the embodiment of the present disclosure introduces coordinate system conversion, and establishes a one-to-one mapping from the image coordinate system to the three-dimensional world coordinate system through the internal parameters of the camera and the estimated depth image, eliminating the need for the image coordinate system and the three-dimensional world coordinate system. The ambiguity between them can greatly improve the detection performance of the system.
  • the generated three-dimensional coordinate points are organized into an image representation form according to the coordinate index of the original image, and the image structure is maintained.
  • the pixel information after coordinate transformation is organized in the form of images, thus avoiding the introduction of point cloud data, so that there is only one form of data representation in the entire system, keeping the system simple and efficient.
  • the embodiments of the present disclosure only have the following beneficial effects: first, high precision: a method that does not use coordinate system transformation (or uses coordinate system transformation, but does not organize the transformed data into an image representation form) Compared with this system, the detection performance that can be obtained by this system is higher; secondly, the model training/testing process is simple: after other existing methods convert the image coordinate system to the 3D coordinate system, the pixels are regarded as point cloud data, which requires Neural networks with different structures are used to train the subsequent steps separately.
  • the system uses data in the form of images from beginning to end, thus avoiding the conversion of data forms and making the overall training/testing process of the system easier; thirdly, it supports end-to-end training : Previous methods require training the model in stages.
  • the neural network is trained using 2D image-oriented, and in the second stage, the neural network is trained using surface 3D point cloud.
  • the two stages cannot interact, so the optimal solution cannot be obtained.
  • the system can integrate two parts and use the neural network training for 2D images uniformly, thus supporting end-to-end training.
  • the target detection method provided by the embodiments of the present disclosure can be applied to an automatic/assisted driving system based on image data.
  • the target detection method provided by the embodiments of the present disclosure may be applied to an AR (Augmented Reality, augmented reality) system and/or a VR (Virtual Reality, virtual reality) system of a mobile terminal (such as a mobile phone), To achieve 3D object detection in AR systems and/or VR systems.
  • AR Augmented Reality, augmented reality
  • VR Virtual Reality, virtual reality
  • the writing order of each step does not mean a strict execution order but constitutes any limitation on the implementation process, and the actual execution order of each step should be based on its functions and possible Internal logic is determined.
  • the embodiment of the present disclosure also provides a target detection device corresponding to the target detection method. See implementation of the method.
  • the target detection apparatus 1200 includes:
  • an acquisition module 1201, configured to acquire an image acquired by an image acquisition component, and internal parameters of the image acquisition component;
  • the determining module 1202 is configured to determine, based on the collected image and internal parameters, the three-dimensional coordinate information of each pixel in the collected image in the world coordinate system;
  • the generating module 1203 is configured to generate a three-dimensional information image corresponding to the collected image according to the collected image and the three-dimensional coordinate information of each pixel in the collected image in the world coordinate system; the pixel point in the three-dimensional information image The ordering is the same as the ordering of the pixels in the collected image;
  • the detection module 1204 is configured to determine, based on the three-dimensional information image, the three-dimensional detection information of the target object contained in the collected image in the world coordinate system.
  • the target detection device 1200 further includes a control module 1205, and the image acquisition component is located on the target vehicle. After determining the three-dimensional detection information of the target object contained in the acquired image, the control module 1205 is configured to:
  • the target vehicle is controlled to travel.
  • the determining module 1202 is configured to:
  • a depth image corresponding to the collected image is generated, and the depth image includes depth information corresponding to each pixel in the collected image;
  • the depth information and internal parameters of each pixel determine the three-dimensional coordinate information of each pixel in the world coordinate system .
  • the generating module 1203 is configured to:
  • a three-dimensional information image is generated; the channel information of each pixel point in the three-dimensional information image contains at least each The three-dimensional coordinate information of the pixel point in the world coordinate system.
  • the generating module 1203 is configured to:
  • the three-dimensional information image is generated based on the multi-channel information corresponding to each pixel point in the three-dimensional information image and the index information of each pixel point in the collected image.
  • the generating module 1203 is configured to:
  • the three-dimensional information image is generated based on the multi-channel information corresponding to each pixel point in the three-dimensional information image and the index information of each pixel point in the collected image.
  • the detection module 1204 is configured to:
  • the three-dimensional information image is cropped to obtain at least one three-dimensional information image block, and each three-dimensional information image block contains at least one target object;
  • each three-dimensional information image block Perform feature extraction on each three-dimensional information image block to obtain multiple feature images corresponding to each of the three-dimensional information image blocks, and the multiple feature images include depth feature images representing depth information of the target object;
  • each 3D information image block For each 3D information image block, according to the 3D object detection network corresponding to each 3D information image block and a plurality of feature images corresponding to each 3D information image block, determine the 3D information image block in each 3D information image block.
  • the 3D detection information of the target object in the world coordinate system.
  • the detection module 1204 is configured to:
  • each 3D information image block For each 3D information image block, according to the set pooling size and pooling step size, perform maximum pooling processing on each feature image corresponding to each 3D information image block to obtain each feature image pool The corresponding pooling value after processing;
  • the pooled value corresponding to each feature image of each of the three-dimensional information image blocks is formed into a target detection feature vector corresponding to each of the three-dimensional information image blocks;
  • the target detection feature vector corresponding to each of the three-dimensional information image blocks and the three-dimensional target detection network corresponding to each of the three-dimensional information image blocks it is determined that the target object in each of the three-dimensional information image blocks is in the world coordinate system 3D inspection information.
  • the target detection apparatus 1200 further includes a training module 1206, and the training module 1206 is configured to:
  • a neural network configured to detect 3D detection information is trained, and the neural network is trained using sample images containing labeled 3D detection information of the target sample object.
  • an embodiment of the present disclosure further provides an electronic device 1300 .
  • a schematic diagram of the electronic device provided by the embodiment of the present disclosure includes:
  • the processor 131 and the memory 132 communicate through the bus 133, so that the processor 131 executes the following instructions : Acquire the image collected by the image acquisition component and the internal parameters of the image acquisition component; based on the acquired image and internal parameters, determine the three-dimensional coordinate information of each pixel in the acquired image in the world coordinate system; image and the three-dimensional coordinate information of each pixel in the collected image in the world coordinate system, to generate a three-dimensional information image corresponding to the collected image; the sorting of the pixels in the three-dimensional information image is the same as that in the collected image.
  • the order of the pixel points is the same; based on the three-dimensional information image, the three-dimensional detection information of the target object contained in the collected image in the world coordinate system is determined.
  • Embodiments of the present disclosure further provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is run by a processor, the steps of the target detection method described in the above method embodiments are executed.
  • the storage medium may be a volatile or non-volatile computer-readable storage medium.
  • the computer program product of the target detection method provided by the embodiments of the present disclosure includes a computer-readable storage medium storing program codes, and the instructions included in the program codes can be used to execute the steps of the target detection method described in the above method embodiments. , see the above method examples.
  • Embodiments of the present disclosure also provide a computer program, which implements any one of the methods in the foregoing embodiments when the computer program is executed by a processor.
  • the computer program product can be implemented in hardware, software or a combination thereof.
  • the computer program product is embodied as a computer storage medium, and in another optional embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK) and the like.
  • the units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • each functional unit in each embodiment of the present disclosure may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the functions, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a processor-executable non-volatile computer-readable storage medium.
  • the computer software products are stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in various embodiments of the present disclosure.
  • the aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program codes .
  • a three-dimensional information image with the same image structure and the three-dimensional coordinate information of each pixel in the world coordinate system can be obtained based on the acquired image, based on the three-dimensional information image.
  • the information image can complete the three-dimensional target detection for the target object.
  • the image acquisition component has the advantages of high portability and low cost.
  • the complete target object in the field of view can be obtained, including the target object with a small volume, so the three-dimensional target detection for the target object in the short-range area can be accurately completed.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Image Analysis (AREA)

Abstract

本公开实施例提供了一种目标检测方法、装置、设备、存储介质及程序产品,其中,该目标检测方法包括:获取图像采集部件采集的图像,以及该图像采集部件的内部参数;基于采集到的图像和内部参数,确定采集到的图像中每个像素点在世界坐标系下的三维坐标信息;根据采集到的图像和采集到的图像中每个像素点在世界坐标系下的三维坐标信息,生成与采集到的图像对应的三维信息图像;三维信息图像中的像素点的排序与采集到的图像中的像素点的排序相同;基于三维信息图像,确定采集到的图像中包含的目标对象在世界坐标系下的三维检测信息。

Description

目标检测方法、装置、设备、存储介质及程序产品
相关申请的交叉引用
本公开基于申请号为202010792241.X、申请日为2020年08月08日、申请名称为“一种目标检测方法、装置、设备及存储介质”的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此以全文引入的方式引入本公开。
技术领域
本公开涉及计算机视觉技术领域,尤其涉及一种目标检测方法、装置、设备、存储介质及程序产品。
背景技术
目标检测是指利用计算机技术检测与识别出图像或视频中的感兴趣目标,比如常见的行人检测、障碍物检测等,随着计算机技术的发展和计算机视觉原理的广泛应用,基于深度学习的目标检测技术已经广泛应用于多种领域,比如机器人领域、自动驾驶领域、行为识别领域等。
在目标检测技术中,二维目标检测技术发展相对比较成熟,相较于二维目标检测任务,三维目标检测任务难度更大,复杂度更高,并且意义重大。如何提供一种简便有效的三维目标检测方式,为亟需解决的问题。
发明内容
本公开实施例至少提供一种目标检测方案。
第一方面,本公开实施例提供了一种目标检测方法,包括:
获取图像采集部件采集的图像,以及所述图像采集部件的内部参数;基于采集到的图像和所述内部参数,确定所述采集到的图像中每个像素点在世界坐标系下的三维坐标信息;根据所述采集到的图像和所述采集到的图像中每个像素点在世界坐标系下的三维坐标信息,生成与所述采集到的图像对应的三维信息图像;所述三维信息图像中的像素点的排序与所述采集到的图像中的像素点的排序相同;基于所述三维信息图像,确定所述采集到的图像中包含的目标对象在所述世界坐标系下的三维检测信息。
第二方面,本公开实施例提供了一种目标检测装置,包括:
获取模块,配置为获取图像采集部件采集的图像,以及所述图像采集部件的内部参数;确定模块,配置为基于采集到的图像和所述内部参数,确定所述采集到的图像中每个像素点在世界坐标系下的三维坐标信息;生成模块,配置为根据所述采集到的图像和所述采集到的图像中每个像素点在世界坐标系下的三维坐标信息,生成与所述采集到的图像对应的三维信息图像;所述三维信息图像中的像素点的排序与所述采集到的图像中的像素点的排序相同;检测模块,配置为基于所述三维信息图像,确定所述采集到的图像中包含的目标对象在所述世界坐标系下的三维检测信息。
第三方面,本公开实施例提供了一种电子设备,包括:处理器、存储器和总线,所述存储器存储有所述处理器可执行的机器可读指令,当电子设备运行时,所述处理器与所述存储器之间通过总线通信,所述机器可读指令被所述处理器执行时执行如第一方面所述的目标检测方法的步骤。
第四方面,本公开实施例提供了一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,该计算机程序被处理器运行时执行如第一方面所述的目标检测方法的步骤。
第五方面,本公开实施例提供了一种计算机程序产品,包括计算机可读代码,在所述计算机可读代码在电子设备中运行的情况下,所述电子设备中的处理器执行如第一方面所述的目标检测方法的步骤。
本公开实施例中,在图像采集部件采集到图像后,可以基于该采集到的图像得到图像结构相同,且增加每个像素点在世界坐标系下的三维坐标信息的三维信息图像,基于该三维信息图像可以完成针对目标对象的三维目标检测,图像采集部件相比雷达装置,具有便携性高、成本低的优点,且相比雷达装置采集的点云数据,图像采集部件在近距离区域内也能够获取到视野范围内完整的目标对象,包括体积较小的目标对象,因此能够准确地完成针对近距离区域的目标对象的三维目标检测。
为使本公开的上述目的、特征和优点能更明显易懂,下文特举较佳实施例,并配合所附附图,作详细说明如下。
附图说明
为了更清楚地说明本公开实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,此处的附图被并入说明书中并构成本说明书中的一部分,这些附图示出了符合本公开的实施例,并与说明书一起用于说明本公开的技术方案。应当理解,以下附图仅示出了本公开的某些实施例,因此不应被看作是对范围的限定,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他相关的附图。
图1A示出了目标对象在三维空间中的检测结果示意图;
图1B示出了目标对象在二维图像上的检测结果示意图;
图1C示出了本公开实施例所提供的一种目标检测方法的流程图;
图2示出了本公开实施例所提供的一种确定像素点在世界坐标系下三维坐标信息的方法流程图;
图3示出了本公开实施例所提供的一种确定像素点在世界坐标系下的三维坐标信息的场景示意图;
图4示出了本公开实施例所提供的第一种生成三维信息图像的方法流程图;
图5示出了本公开实施例所提供的第二种生成三维信息图像的方法流程图;
图6示出了本公开实施例所提供的一种确定目标对象的三维检测信息的方法流程图;
图7示出了本公开实施例所提供的一种确定目标对象的三维检测信息的方法流程图;
图8示出了本公开实施例所提供的一种确定目标对象的三维检测信息的神经网络示意图;
图9A示出了本公开实施例所提供的一种神经网络的训练方法示意图;
图9B示出了本公开实施例所提供的一种神经网络的训练方法示意图;
图10示出了本公开实施例所提供的一种目标车辆的控制方法流程图;
图11A示出了本公开实施例所提供的一种目标检测方法的逻辑流程图;
图11B示出了本公开实施例所提供的一种待检测图像的示意图;
图11C示出了本公开实施例所提供的一种深度图像的示意图;
图11D示出了本公开实施例所提供的一种三维信息图像的示意图;
图12示出了本公开实施例所提供的一种目标检测装置的结构示意图;
图13示出了本公开实施例所提供的一种电子设备的示意图。
具体实施方式
为使本公开实施例的目的、技术方案和优点更加清楚,下面将结合本公开实施例中附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本公开一部分实施例,而不是全部的实施例。通常在此处附图中描述和示出的本公开实施例的组件可以以各种不同的配置来布置和设计。因此,以下对在附图中提供的本公开的实施例的详细描述并非旨在限制要求保护的本公开的范围,而是仅仅表示本公开的选定实施例。基于本公开的实施例,本领域技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例,都属于本公开保护的范围。
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步定义和解释。
本文中术语“和/或”,仅仅是描述一种关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中术语“至少一种”表示多种中的任意一种或多种中的至少两种的任意组合,例如,包括A、B、C中的至少一种,可以表示包括从A、B和C构成的集合中选择的任意一个或多个元素。
目标检测是指利用计算机技术检测与识别出图像或视频中的感兴趣目标,比如常见的行人检测、障碍物检测等。在目标检测中,包含二维目标检测和三维目标检测:二维目标检测结果可以标记出图像包含的目标对象的二维检测框,三维目标检测结果可以标记出图像中包含的目标对象的三维检测框,相比二维目标检测,三维目标检测的复杂度更高,意义更加重大。
计算机视觉中,三维目标检测是一个重要的任务。该任务需要检测目标在三维空间内的坐标和形状以及朝向。由于图像数据缺少深度信息,基于图像的三维检测系统一般需要对目标图像先进行深度估计,以获取图像中每个像素的深度信息,然后以RGB图像和估计出的深度图作为系统的输入,计算图像中目标的三维信息。如图1A和1B所示,分别为目标对象(汽车)在三维空间中的检测结果以及其在二维图像上的检测结果。其中矩形框11为检测结果,矩形框12为手工标注结果。
相关技术中基于图像的三维检测方法主要存在以下不足:一方面图像数据缺少对应的深度信息, 无法有效地估计出目标的三维信息(位置、形状、朝向);另一方面,图像数据与三维空间分属于不同坐标系,直接利用图像数据计算三维空间中的结果会产生较大误差,导致性能严重下降;再一方面,利用相机参数可以将深度数据映射到三维空间,然而这种方法会将图像像素映射为三维点云。会导致额外的问题产生:例如,整个系统会包含不同形态的数据(图像数据和点云数据),导致系统中必须包含不同的模块分别处理这两种数据,无法对其进行统一处理。再如,两种不同形态的数据需要分阶段训练,会导致模型无法需要整体最优解。另外,目前面向点云的深度学习技术的发展远远落后于面向图像的深度学习技术,这会使得系统中处理点云数据的模块成为限制整个检测系统性能的瓶颈。
在基于雷达装置采集点云图像的方式对目标对象进行三维目标检测时,需要针对进行检测的物体安装雷达装置,比如针对进行三维目标检测的机器人安装雷达装置,通过安装雷达装置进行三维目标检测的方式成本较高且便携性较差。另外,通过雷达装置采集点云图像的方式进行三维目标检测时,由于雷达装置存在雷达盲区,且存在分辨率较低的问题,这样针对近距离的雷达盲区,或者体积较小的目标对象,可能无法生成包含目标对应的有效点云数据。因此雷达装置采集点云图像的方式进行目标检测时存在成本较高、便携性差、针对近距离区域或者体积较小的物体进行检测时准确度较低的问题。
基于上述研究,本公开实施例提供了一种目标检测方法,在获取到图像采集部件采集的图像后,可以通过采集到的图像和图像采集部件的内部参数,确定采集到的图像中每个像素点在世界坐标系下的三维坐标信息,然后按照采集到的图像和采集到的图像中每个像素点在世界坐标系下的三维坐标信息,得到像素点排序与采集到的图像中的像素点排序一致的三维信息图像。因为像素点的排序不变,所以三维信息图像仍然可以保留与采集到的图像相同的图像结构,基于此可以有效确定采集到的图像中包含的目标对象在世界坐标系下的三维检测信息。
可见,本公开实施例在进行目标检测时,在图像采集部件采集到的图像后,可以基于该采集到的图像得到图像结构相同,且增加每个像素点在世界坐标系下的三维坐标信息的三维信息图像。基于该三维信息图像可以完成针对目标对象的三维目标检测。图像采集部件相比雷达装置,具有便携性高、成本低的优点;且相比雷达装置采集的点云数据,图像采集部件在近距离区域内也能够获取到视野范围内完整的目标对象,包括体积较小的目标对象,因此能够准确地完成针对近距离区域的目标对象的三维目标检测。
为便于对本公开实施例进行理解,首先对本公开实施例所公开的一种目标检测方法进行详细介绍。本公开实施例所提供的目标检测方法的执行主体一般为具有一定计算能力的计算机设备,该计算机设备例如包括:终端设备或服务器或其它处理设备。在一些可能的实现方式中,该目标检测方法可以通过处理器调用存储器中存储的计算机可读指令的方式来实现。
参见图1C所示,为本公开实施例提供的一种目标检测方法的流程图,该目标检测方法包括以下步骤S101至S104,其中:
步骤S101,获取图像采集部件采集的图像,以及该图像采集部件的内部参数。
示例性地,图像采集部件可以包括可见光(红:Red;绿:Green;蓝:Blue,RGB)摄像机或者RGB相机等可以采集到RGB图像的相机部件,对应的采集到的图像可以为RGB图像。
示例性地,图像采集部件的内部参数可以包括用于将图像坐标系向相机坐标系进行转换的相机内参矩阵中的部分或全部参数,本公开实施例在此不作限定。
步骤S102,基于采集到的图像和内部参数,确定采集到的图像中每个像素点在世界坐标系下的三维坐标信息。
示例性地,基于采集到的图像可以建立图像坐标系,基于构建的图像坐标系可以确定每个像素点在图像坐标系下的像素坐标值,基于图像坐标系和相机坐标系之间的转换关系(图像采集部件的内部参数),可以确定采集到的图像中包含的每个像素点在相机坐标系下沿X轴和Y轴的坐标值。进一步基于世界坐标系和相机坐标系之间的相互转换关系(图像采集部件的外部参数),可以确定采集到的图像中包含的每个像素点在世界坐标系下沿X轴和Y轴的坐标值。当相机坐标系和世界坐标系重合时,可以直接将每个像素点在相机坐标系下的坐标值作为该像素点在世界坐标系下的坐标值。
针对每个像素点在世界坐标系下沿Z轴方向的坐标值,可以根据该像素点在相机坐标系下的深度信息确定。在一些实施方式中,可以根据采集到的图像以及预先训练的用于确定深度图像的神经网络,来确定该采集到的图像对应的深度图像,从而得到采集到的图像中的每个像素点在相机坐标系下的深度信息。这样,结合每个像素点在图像坐标系的像素坐标值以及该像素点在相机坐标系下的深度信息,可以确定该像素点在世界坐标系下的三维坐标信息,实施过程将在后文进行详细阐述。
步骤S103,根据采集到的图像和采集到的图像中每个像素点在世界坐标系下的三维坐标信息,生成与采集到的图像对应的三维信息图像;
其中,三维信息图像中的像素点的排序与采集到的图像中的像素点的排序相同。
示例性地,采集到的图像包含的多个像素点可以按照设定的纹理、色调以及排序等信息构成图像结构。图像结构可以反映出采集到的图像中包含的待检测的目标对象对应的结构信息,当采集到的图像中的像素点的排序不变时,采集到的图像具有的图像结构也不会发生变化,即图像中包含的目标对象的形状不会发生变化。因此,当三维信息图像中的像素点的排序与采集到的图像中的像素点的排序相同时,三维信息图像仍然可以保留与采集到的图像相同的图像结构,基于此可以有效确定采集到的图像中包含的目标对象在世界坐标系下的三维检测信息。
在一些实施方式中,在根据采集到的图像和采集到的图像中每个像素点在世界坐标系下的三维坐标信息,生成与采集到的图像对应的三维信息图像时,可以包括:
按照每个像素点对应的三维坐标信息以及该像素点在采集到的图像中的索引信息,生成该三维信息图像;其中,三维信息图像中的每个像素点的通道信息至少包含该像素点在世界坐标系下的三维坐标信息。
示例性地,每个像素点在采集到的图像中的索引信息可以表示该像素点在采集到得到图像中的位置,比如采集到的图像包含m*n个像素点,可以通过索引信息(i,j)来表示像素点在采集到的图像中的索引信息,(i,j)可以表示该像素点位于采集到的图像中的i行j列。
在对采集到的图像中包含的目标对象进行三维目标检测时,需要知道构成目标对象的各个像素点的三维坐标信息,基于此,可以结合每个像素点对应的三维坐标信息和该像素点在采集到的图像中的索引信息,重新构建具备图像形式的三维信息图像。这样,在构建该三维信息图像时,因同时考虑了每个像素点在采集到的图像中的索引信息以及三维坐标信息,故构建的三维信息图像与采集到的图像具有相同的图像结构,即包含的目标对象的形状保持不变。另外再结合三维信息图像中包含的各个像素点对应的三维坐标信息,可以对该三维信息图像中包含的目标对象进行三维目标检测。
本公开实施例中,在生成与采集到的图像对应的三维信息图像时,是按照每个像素点在采集到的图像中的索引信息生成的,因此三维信息图像仍然可以保留与采集到的图像相同的图像结构。另外,相比采集到的图像,该三维信息图像针对每个像素点还增加了该像素点在世界坐标系下的三维坐标信息,因此可以基于该三维信息图像检测目标对象在世界坐标系下的三维检测信息。
步骤S104,基于三维信息图像,确定采集到的图像中包含的目标对象在世界坐标系下的三维检测信息。
示例性地,目标对象在不同的应用场景下包括的形态不同,在自动驾驶领域,目标对象可以包括车辆、行人、栏杆等待进行三维目标检测的目标对象。
在得到三维信息图像后,可以基于该三维信息图像对目标对象进行三维目标检测。因为该三维信息图像中包含与采集到的图像相同的图像结构,因此可以通过该三维信息图像来检测采集到的图像中包含的目标对象在世界坐标系下的三维检测信息。
示例性地,每个目标对象在世界坐标系下的三维检测信息可以包含该目标对象的中心点在世界坐标系下的位置坐标,以及该目标对象在世界坐标系下的长度、宽度和高度,以及该目标对象在世界坐标系下的朝向角度。该朝向角度可以通过预先设定的目标对象的正方向与预设方向的夹角来表示,比如目标对象为车辆时,可以将车辆的车头朝向与预设方向的夹角来表示该车辆的朝向角度。
示例性地,可以通过目标对象对应的三维(three-dimensional,3D)检测框的位置信息来表示目标对象的三维检测信息。在一些实施方式中,目标对象在世界坐标系下的长度、宽度和高度可以分别通过目标对象对应的3D检测框的长、宽和高来表示,目标对象的中心点可以通过该目标对象对应的3D检测框的中心点来表示,目标对象的朝向角度可以通过该目标对象对应的3D检测框的朝向角度来表示。一般情况下,目标对象对应的3D检测框可以通过该目标对象的外接长方体来表示。
本公开实施例中,在图像采集部件采集到图像后,可以基于该采集到的图像得到图像结构相同,且增加每个像素点在世界坐标系下的三维坐标信息的三维信息图像,基于该三维信息图像可以完成针对目标对象的三维目标检测。而图像采集部件相比雷达装置,具有便携性高、成本低的优点,且相比雷达装置采集的点云数据,图像采集部件在近距离区域内也能够获取到视野范围内完整的目标对象,包括体积较小的目标对象,因此能够准确地完成针对近距离区域的目标对象的三维目标检测。
下面将结合实施例对上述步骤S101至S104进行详细阐述:
针对上述步骤S102,在基于采集到的图像和内部参数,确定采集到的图像中每个像素点在世界坐标系下的三维坐标信息时,如图2所示,可以包括以下步骤S1021至S1022:
步骤S1021,基于采集到的图像,生成采集到的图像对应的深度图像,深度图像中包含采集到的图像中的每个像素点的深度信息;
步骤S1022,基于采集到的图像中每个像素点在图像坐标系下的二维坐标信息、该像素点的深度信息以及内部参数,确定该像素点在世界坐标系下的三维坐标信息。
示例性地,在确定采集到的图像对应的深度图像时,可以根据预先训练的用于确定深度图像的神经网络来确定该采集到的图像对应的深度图像,从而得到采集到的图像中的每个像素点的深度信息,例如可以为在相机坐标系下的深度信息。
其中,用于确定采集到的图像对应的深度图像的神经网络可以通过大量预先采集的样本图像,以及针对该样本图像标注的设定像素点在相机坐标系下的深度信息来训练得到,本公开实施例对确定深度图像的神经网络的训练过程不作限定。
在得到采集到的图像中的每个像素点对应的深度信息后,可以先确定每个像素点在相机坐标系下的三维坐标信息,然后确定该像素点在世界坐标系下的三维坐标信息。每个像素点在世界坐标系下的三维坐标信息可以包括沿世界坐标系下X轴方向的坐标值、沿Y轴方向的坐标值以及沿Z轴方向的坐标值。为了便于计算,本公开实施例可以使得相机坐标系与世界坐标系重合,即使得相机坐标系的坐标原点与世界坐标系的坐标原点重合,使得相机坐标系的X轴与世界坐标系的X轴重合,使得相机坐标系的Y轴与世界坐标系的Y轴重合,以及使得相机坐标系的Z轴与世界坐标系的Z轴重合。如图3所示,为相机坐标系和世界坐标系重合时的情况,在该情况下,Fc表示相机坐标系或者世界坐标系的原点,Xc表示相机坐标系或者世界坐标系的X轴,Yc表示相机坐标系或者世界坐标系的Y轴,Zc表示相机坐标系或者世界坐标系的Z轴。假设像素点P为采集到的图像中第i行第j列的像素点。在一些实施方式中,可以根据以下公式(1)来确定像素点P在世界坐标系下的三维坐标信息:
Figure PCTCN2021090359-appb-000001
其中,Z (i,j)表示采集到的图像的像素点P在世界坐标系下沿Z轴方向的坐标值;X (i,j)表示采集到的图像的像素点P在世界坐标系下沿X轴方向的坐标值;Y (i,j)表示采集到的图像的像素点P在世界坐标系下沿Y轴方向的坐标值;u (i,j)表示采集到的图像的像素点P在像素坐标系下沿u轴方向的坐标值;v (i,j)表示采集到的图像的像素点P在像素坐标系下沿v轴方向的坐标值;d (i,j)表示采集到的图像的像素点P的深度值;(Cx,Cy)表示图像采集部件的光点C在世界坐标系下的坐标值,其中Cx表示图像采集部件的光轴与采集到的图像的交点在世界坐标系下沿X轴方向的坐标值;Cy表示图像采集部件的光轴与采集到的图像的交点在世界坐标系下沿Y轴方向的坐标值;f表示图像采集部件的焦距。
以上过程,当世界坐标系和相机坐标系重合时,使用到的相机参数信息包括图像采集部件的光轴与采集到的图像的交点在世界坐标系下沿X轴方向的坐标值、图像采集部件的光轴与采集到的图像的交点在世界坐标系下沿Y轴方向的坐标值以及图像采集部件的焦距。该方式在自动驾驶领域时,可以直接以目标车辆上设置的图像采集部件的光心为原点,使得世界坐标系与图像采集部件对应的相机坐标系重合,从而可以直接使用上述公式,来确定每个像素点在世界坐标系下的三维坐标信息。
本公开实施例中,可以基于采集到的图像快速预测出该采集到的图像每个像素点对应的深度信息,进一步可以基于每个像素点在图像坐标系下的二维坐标信息、对应的深度信息,再结合图像采集部件的内部参数,快速得到采集到的图像中每个像素点在世界坐标系下的三维坐标信息。在得到每个像素点在世界坐标系下的三维坐标信息后,可以基于每个像素点在世界坐标系下的三维坐标信息,生成与采集到的图像对应的三维信息图像。
在一种实施方式中,针对上述步骤S103,在按照每个像素点对应的三维坐标信息,以及该像素点在采集到的图像中的索引信息,生成三维信息图像时,如图4所示,可以包括以下步骤S1031至S1032:
步骤S1031,将每个像素点对应的三维坐标信息,作为该像素点在三维信息图像中对应的多通道信息;
步骤S1032,基于该像素点在三维信息图像中对应的多通道信息,以及该像素点在采集到的图像中的索引信息,生成三维信息图像。
示例性地,以采集到的图像为RGB图像为例,RGB图像中每个像素点在RGB图像中包含三通道信息,即R通道上的通道值、G通道上的通道值和B通道上通道值。每个像素点在R通道上的通道值、G通道上的通道值和B通道上的通道值可以代表该像素点在RGB图像中的颜色信息。
三维信息图像作为图像表征形式,同样由多个像素点构成。为了使得三维信息图像与采集到的图像具有相同的图像结构,在得到每个像素点对应的三维坐标信息后,可以按照该像素点在采集到的图像中的索引信息,依次将该像素点对应的三维坐标信息替换该像素点在采集到得到图像中的多通道信息后,比如替换上述RGB图像中的每个像素点的三通道信息后,生成三维信息图像。
按照这样的方式,三维信息图像中每个像素点在三维信息图像中的多通道信息包含该像素点在世界坐标系下沿X轴通道上的坐标值、在世界坐标系下沿Y轴通道上的坐标值和在世界坐标系下沿Z轴通道上的坐标值。三维信息图像相比采集到的图像,包含的像素点的个数不变,像素点的排序方式不变,因此三维信息图像相比对应的采集到的图像,具有相同的图像结构。因此可以识别出采集到的图像中包含的目标对象的结构信息,从而便于基于该三维信息图像来对采集到的图像中包含的目标对象进行三维目标检测。
在另一种实施方式中,针对上述步骤S103,在按照每个像素点对应的三维坐标信息,以及该像素点在采集到的图像中的索引信息,生成三维信息图像时,如图5所示,可以包括以下步骤S1033至S1034:
步骤S1033,将每个像素点对应的三维坐标信息以及该像素点在采集到的图像中的信息,作为该像素点在三维信息图像中对应的多通道信息;
步骤S1034,基于该像素点在三维信息图像中对应的多通道信息,以及该像素点在采集到的图像中的索引信息,生成三维信息图像。
同样,为了使得三维信息图像具有与采集到的图像一致的图像结构,在得到每个像素点对应的三维坐标信息后,可以按照该像素点在采集到的图像中的索引信息,为该像素点增加由三维坐标信息构成的三通道信息,生成采集到的图像对应的三维信息图像,以采集到的图像为RGB图像为例,按照该方式得到的三维信息图像的每个像素点可以包含六通道信息,即包括R通道上的通道值、G通道上的通道值、B通道上通道值、世界坐标系下沿X轴通道上的坐标值、在世界坐标系下沿Y轴通道上的坐标值和在世界坐标系下沿Z轴通道上的坐标值。
按照该方式生成的三维信息图像相比采集到的图像,包含的像素点的个数不变、像素点的排序方式不变,因此三维信息图像相比对应的采集到的图像,具有与采集到的图像一致的图像结构。除此之外,三维信息图像还保留有采集到的图像的信息,比如可以包含采集到的图像的颜色信息,从而便于基于该三维信息图像来对采集到的图像中包含的目标对象进行准确的三维目标检测。
针对上述步骤S104,在基于三维信息图像,确定采集到的图像中包含的目标对象在世界坐标系下的三维检测信息时,如图6所示,可以包括以下步骤S1041至S1044:
步骤S1041,基于采集到的图像中包含的目标对象的二维检测信息,对三维信息图像进行裁剪,得到至少一个三维信息图像块,每个三维信息图像块中包含至少一个目标对象。
示例性地,可以通过预先训练的用于进行二维目标检测的神经网络,来对采集到的图像进行目标检测,从而得到该采集到得到图像中包含的目标对象的二维检测信息。目标对象的二维检测信息可以为目标对象的二维检测框在采集到的图像中的位置区域。
按照采集到的图像中包含的目标对象的二维检测框,可以在三维信息图像中剪裁得到与该二维检测框相同尺寸的三维信息样本图像块,这样可以过滤掉不包含目标对象的区域,从而后期可以直接针对三维信息图像块进行目标检测,该方式可以缩小检测范围,从而提高检测效率。
步骤S1042,针对每个三维信息图像块进行特征提取,得到该三维信息图像块对应的多个特征图像,多个特征图像中包含表征目标对象深度信息的深度特征图像。
这里可以基于预先训练的神经网络中的特征提取网络来提取每个三维信息图像块对应的多个特征图像。在针对三维信息图像块进行特征提取前,针对尺寸不同的三维信息图像块,可以进行尺寸调整,使得输入特征提取网络的三维信息图像块的尺寸一致。
特征提取网络可以包含多个卷积核,每个卷积核用于提取三维信息图像块对应的一个特征图像。多个特征图像中可以包含用于表征目标对象深度信息的深度特征图像、用于表征目标对象长度信息的特征图像、用于表征目标对象宽度信息的特征图像以及用于表征目标对象的中心点位置信息的特征图像。
步骤S1043,基于每个三维信息图像块对应的深度特征图像,对至少一个三维信息图像块进行分类,确定每种类别的三维信息图像块对应的三维目标检测网络。
每个三维信息图像块中包含的目标对象在世界坐标系下的深度信息可以不同,可以预先基于每个三维信息图像块对应的深度特征图像,对多个三维信息图像块进行分类,比如可以基于目标对象对应的深度信息,对多个三维信息图像块进行分类,确定每种类别的三维信息图像对应的三维目标检测网络。
示例性地,预先训练的神经网络中可以包含多个三维目标检测网络,每个三维目标检测网络可以预测一类三维信息图像块中包含的目标对象的三维检测信息,比如预先训练的神经网络中包含三个目标检测网络,第一个目标检测网络用于检测深度信息大于0且小于或等于L1的三维信息图像块,第二个目标检测网络用于检测深度信息大于L1且小于或等于L2的三维信息图像块,第三个目标检测网络用于检测深度信息大于L3的三维信息图像块。
通过预先训练针对不同深度信息进行三维目标检测的多个三维目标检测网络,可以使得每个三维目标检测网络能够针对具有相同深度范围的三维信息图像块进行检测。这样,一方面具有相同深度范围的三维信息图像块中的目标对象对应的三维检测信息差异性小,可以使得三维目标检测网络在进行三维目标检测时能够提高检测精度;另一方面当三维信息图像中包含多个具有不同深度信息的目标对象时,能够通过多个三维目标检测网络同时进行三维目标检测,从而能够提高检测速度。
按照上述方式,在基于每个三维信息图像块对应的深度特征图像,确定出每个三维信息图像块对应的深度信息后,可以确定该三维信息图像块对应的三维目标检测网络。
步骤S1044,针对每个三维信息图像块,按照该三维信息图像块对应的三维目标检测网络以及该三维信息图像块对应的多个特征图像,确定该三维信息图像块中的目标对象在世界坐标系下的三维检测信息。
在基于三维目标检测网络针对对应的三维信息图像块进行三维目标检测时,需要考虑该三维信息图像块对应的多个特征图像,比如上述提到的用于表征目标对象深度信息的深度特征图像、用于表征目标对象长度信息的特征图像、用于表征目标对象宽度信息的特征图像以及用于表征目标对象的中心点位置信息的特征图像等,每个三维目标检测网络可以基于这些特征图像,来预测对应的三维信息图像块包含的目标对象的三维检测信息。
本公开实施例中,首先可以基于采集到的图像中包含的目标对象对应的二维检测信息对三维信息图像进行剪裁,得到多个三维信息图像块,该方式可以过滤掉不包含目标对象的检测区域,从而可以缩小检测范围,提高检测效率,另外,在针对三维信息图像块进行三维目标检测时,可以基于预先构建多个三维目标检测网络进行同时检测,可以提高检测精度和速度。
针对上述步骤S1044,在针对每个三维信息图像块,按照该三维信息图像块对应的三维目标检测网络以及该三维信息图像块对应的多个特征图像,确定该三维信息图像块中的目标对象在世界坐标系下的三维检测信息时,如图7所示,可以包括以下步骤S10441至S10443:
步骤S10441,针对每个三维信息图像块,按照设定的池化尺寸和池化步长,对该三维信息图像块对应的每个特征图像进行最大池化处理,得到该特征图像池化处理后对应的池化值。
示例地,若进行特征提取的特征提取网络包含多个卷积核,则针对每个三维信息图像块可以得到对应的多个特征图像。每个特征图像包含该三维信息图像块中包含的目标对象的一种属性特征,比如可以包含该三维信息图像块中包含的目标对象的纹理属性特征、颜色属性特征、深度属性特征、长度属性特征、宽度属性特征、中心点位置属性特征等。
针对其中一个三维信息图像块对应的每个特征图像,可以进行最大池化处理,得到该特征图像池化处理后对应的池化值。比如以其中一个特征图像为例,该特征图像包含4*4的特征值,按照池化尺寸为2*2,步长为2进行最大池化处理,可以得到2*2个池化值,若按照与该特征图像相同尺寸的池化尺寸进行最大池化处理,可以得到1*1个池化值。
在一些实施方式中,在对三维信息图像块对应的每个特征图像进行池化处理之前,可以先确定该三维信息图像块对应的二进制掩码图像,该二进制掩码图像在表征目标对象的区域的值为1,与表征非目标对象的区域的值为0,在得到三维信息图像块对应的二进制掩码图像后,可以先基于该二进制掩码图像对该三维信息图像块对应的每个特征图像进行筛选处理,筛选出每个特征图像中表示目标对象的特征值,而将非目标对象的特征值变为0。这样后期在对该三维信息图像块进行池化处理时,一方面可以提高池化处理的速度,另一方面,由于将非目标对象的特征值变为0,这样可以去除作为非目标对象的背景区域的干扰特征值,从而可以得到更加准确的池化值,以便后期提高三维目标检测的准确度。
步骤S10442,将该三维信息图像块的每个特征图像对应的池化值,组成该三维信息图像块对应的目标检测特征向量。
针对每个三维信息图像块,可以基于该三维信息图像块对应的多个特征图像的池化值构成该三维信息图像块对应的目标检测特征向量,通过该目标检测特征向量来表示该三维信息图像块中包含的目标对象的全面特征信息,该全面特征信息可以包含上述提到的目标对象的纹理属性特征、颜色属性特征、深度属性特征、长度属性特征、宽度属性特征和中心点位置属性特征等。
示例性地,若针对每个三维信息图像块包含10个特征图像,每个特征图像对应1*1个池化值,则该三维信息图像块对应的目标检测特征向量包含10个特征值;若每个特征图像对应2*2个池化值,则该三维信息图像块对应的目标检测特征向量包含10*4个特征值。
步骤S10443,基于该三维信息图像块对应的目标检测特征向量,以及该三维信息图像块对应的三维目标检测网络,确定该三维信息图像块中的目标对象在世界坐标系下的三维检测信息。
将该三维信息图像块对应的目标检测特征向量输入与该三维信息图像块对应的三维目标检测网络,可以确定该三维信息图像块中包含的目标对象在世界坐标系下的三维检测信息。
下面结合图8,对上述确定三维信息图像块中的目标对象的三维检测信息的过程进行进一步说明:
如图8所示,将三个三维信息图像块81输入特征提取网络82,可以得到该三维信息图像块81对应的多个特征图像83。另外,这里还可以基于三维信息图像块81中每个像素点在世界坐标系下对应的深度信息,对三维信息图像块81进行阈值分割,得到三维信息图像块对应的二进制掩码图像84,然后先基于该二进制掩码图像84对三维信息图像块81对应的每个特征图像83进行筛选处理后,再对筛选处理后的特征图像进行池化处理,得到每个特征图像包含的目标对象对应的池化值85,基于此生成该三维信息图像块81对应的目标检测特征向量。另外,可以对池化值85进行类型预测处理,基于表示目标对象的深度信息的池化值来确定每个三维信息图像块各自对应的三维目标检测网络87,进一步将该三维信息图像块对应的目标检测特征向量输入对应的三维目标检测网络,完成三维目标检测。
本公开实施例中,提出通过对三维信息图像块的每个特征图像进行最大池化处理,便于提取待进行三维目标检测的有效特征信息,从而提高三维目标检测的效率。
上述多次提到的三维检测信息由预先训练的神经网络检测得到的,神经网络利用了包含目标样本对象的标注三维检测信息的样本图像训练得到。
预先可以采集大量的样本图像,并对每张样本图像均进行目标样本对象标注,确定每张样本图像中包含的目标样本对象对应的标注三维检测信息,该标注三维检测信息可以是基于预先设定好的目标样本对象在世界坐标系下的三维坐标信息确定的。
在一些实施方式中,如图9A所示,神经网络采用以下步骤训练得到,包括步骤S901至S905:
步骤S901,获取图像采集部件采集的样本图像,以及该图像采集部件的内部参数。
该过程与上述获取采集到的图像,以及图像采集部件的内部参数的过程相似,对于该过程中未披露的技术细节,请参照上述图像采集部件的内部参数的过程描述而理解。
步骤S902,基于采集到的样本图像和内部参数,确定采集到的样本图像中每个样本像素点在世界坐标系下的三维坐标信息。
该过程与上述确定每个采集到的图像中每个像素点在世界坐标系下的三维坐标信息的方式相似,对于该过程中未披露的技术细节,请参照上述确定每个采集到的图像中每个像素点在世界坐标系下的三维坐标信息的过程描述而理解。
步骤S903,根据采集到的样本图像和采集到的样本图像中每个样本像素点在世界坐标系下的三维坐标信息,生成与采集到的样本图像对应的三维信息样本图像;三维信息样本图像中的样本像素点的排序与采集到的样本图像中的样本像素点的排序相同。
该过程与上述生成三维信息图像的方式相似,对于该过程中未披露的技术细节,请参照上述生成三维信息图像的过程描述而理解。
步骤S904,基于三维信息样本图像和待训练的神经网络,预测得到样本图像中包含的目标样本对象在世界坐标系下的三维检测信息。
在一些实施方式中,待训练的神经网络中包含多种三维目标检测网络,针对上述步骤S904,在基于三维信息样本图像和待训练的神经网络,预测得到样本图像中包含的目标样本对象在世界坐标系下的三维检测信息时,如图9B所示,可以包括以下步骤S9041至S9043:
步骤S9041,基于样本图像中包含的目标样本对象的二维检测信息,对三维信息样本图像进行 剪裁,得到至少一个三维信息样本图像块,每个三维信息图像块中包含至少一个目标对象;
步骤S9042,针对至少一个三维信息样本图像块进行特征提取,得到每个三维信息样本图像块对应的多个特征样本图像,多个特征样本图像中包含表征目标样本对象深度信息的深度特征样本图像;
步骤S9043,基于至少一个三维信息样本图像分别对应的深度特征样本图像,对至少一个三维信息样本图像块进行分类,确定每个三维信息样本图像块对应的三维目标检测网络;
步骤S9044,针对每个三维信息样本图像块,按照神经网络中与该三维信息样本图像块对应的三维目标检测网络以及该三维信息样本图像块对应的多个特征样本图像,预测该三维信息样本图像块中的目标样本对象在世界坐标系下的三维检测信息。
该过程与上述预测每个三维信息图像块中的目标对象在世界坐标系下的三维检测信息的方式相似。这里通过设置多种三维目标检测网络进行训练,可以得到多种三维目标检测网络,可以对深度信息不同的三维信息图像块进行三维目标检测,从而在应用过程中提高检测精度和速度。
步骤S905,基于预测得到的三维检测信息和标注三维检测信息,对待训练的神经网络中的网络参数值进行调整,得到用于确定三维检测信息的神经网络。
基于上述方式可以预测得到每张样本图像中包含的目标样本对象的三维检测信息,进一步基于预测得到的三维检测信息和真实的标注三维检测信息可以得到待训练的神经网络的损失函数对应的损失值,然后基于该损失值对网络参数值进行调整,可以得到用于确定三维检测信息的神经网络。
在一些实施方式中,预测得到的三维检测信息和真实的标注三维检测信息对应的损失值可以包含针对目标样本对象的尺寸的损失值、针对目标样本对象的中心点的损失值以及针对目标样本对象的朝向角度的损失值等,通过多次训练使得损失值小于设定损失阈值,或者训练次数达到设定训练次数后,可以完成对网络参数值的调整,得到训练完成的神经网络。
在一种实施方式中,本公开实施例提供的目标检测方式可以应用于自动驾驶领域,其中,图像采集部件可以位于目标车辆上。这样,在确定采集到的图像中包含的目标对象的三维检测信息后,如图10所示,本公开实施例提供的目标检测方法还包括以下步骤S1001至S1002:
步骤S1001,基于每个目标对象的三维检测信息,确定该目标对象与目标车辆之间的距离信息;
步骤S1002,基于每个目标对象的三维检测信息、距离信息、以及目标车辆的当前位姿数据,控制目标车辆行驶。
基于每个目标对象对应的三维检测信息,可以包括目标对象在世界坐标系下的尺寸、朝向角度和中心点位置坐标等,基于此可以表示目标对象在世界坐标系下的位姿数据。另外,基于每个目标对象的中心点位置坐标可以得到该目标对象与目标车辆之间的距离信息。
基于每个目标对象的三维检测信息、与目标车辆的距离信息以及目标车辆的当前位姿数据,可以控制目标车辆避开作为障碍物的目标对象。
示例性地,当图像采集部件位于目标车辆上时,可以以图像采集部件的光心为原点建立世界坐标系,这样可以通过目标对象的中心点和原点在世界坐标系下的距离表示目标对象与目标车辆之间的距离信息。
示例性地,在控制目标车辆行驶过程中,可以首先通过目标对象与目标车辆之间的距离,确定目标车辆是否驶入该目标对象对应的危险区域,比如当该距离小于预设安全距离时,可以确定目标车辆驶入该目标对象对应的危险区域,进一步基于该目标对象对应的三维位姿数据以及目标车辆的当前位姿数据,确定按照当前行驶路线行驶时是否会发生碰撞。在确定不会发生碰撞时,可以按照原始路线继续行驶,在确定会发生碰撞时,可以调整行驶路线,或者减速避障。
本公开实施例中,在检测出采集到的图像中包含的目标对象的三维检测信息后,可以基于此得到每个目标对象与目标车辆之间的距离信息,考虑到每个目标对象的三维检测信息可以表示该目标对象在世界坐标系中的位姿数据。因此,基于该目标对象的三维检测信息、与目标车辆的距离信息以及目标车辆的当前位姿数据,来控制目标车辆的行驶,能够提高目标车辆的行驶安全性。
本公开实施例提供了一种基于图像的三维检测系统的图像数据坐标系转换方法,可以在转换坐标系的同时保持图像结构,令检测系统的精度进一步提升。在实施中,先计算待检测图像的深度图像,之后获取图像的拍摄相机的内部参数;然后利用深度图像以及相机的内部参数计算出每个像素点的三维空间位置,并将其组织为图像数据形式;最后再利用面向图像的深度学习技术得到目标的三维信息。
图11A为本公开实施例提供的一种目标检测方法的逻辑流程图,如图11A所示,以图像采集部件为相机为例,该方法至少包括以下步骤:
步骤S1101,获取相机拍摄的待检测图像;
这里,如图11B所示,待检测图像为目标对象的二维图像,缺少对应的深度信息,无法有效地估计出目标对象的三维信息(位置,形状,朝向)。
步骤S1102,获取待检测图像的深度图像;
这里,待检测图像的深度图像如图11C所示,目标对象(汽车)部分的深度值与其他部分中的深度值不同。可以通过图像深度估计方法来弥补图像数据中缺少的深度信息。使用深度估计获取到待检测图像的深度图像,能够有效地补充二维图像中缺少的深度信息。
值得注意的是,相关技术中的深度估计算法一般都可以满足此要求即获取待检测图像拍摄时的相机参数,本公开实施例对采样何种深度估计算法不作限定。
步骤S1103,获取待检测图像拍摄时的相机参数;
这里,所述相机参数为相机的内部参数,可以包括焦距和主点。
步骤S1104,确定待检测图像中每一像素点的三维坐标信息;
这里,计算待检测图像的每个像素在三维坐标系下的位置。
对于待检测图像的每个像素点获取其在图像坐标系下的索引信息,例如索引值(i,j)可以表示该像素点位于待检测图像中的i行j列;在深度图像中获取该索引值的深度值d;以及上一步骤中获取的相机内部参数,利用公式(1)计算该像素点在三维空间下的坐标,从而得到待检测图像中所有像素点的三维坐标信息。
步骤S1105,基于每一像素点的三维坐标信息,生成三维信息图像;
这里,通过三维信息图像将待检测图像中每一像素点的三维坐标信息组织为图像形式,如图11D所示。在实施中,可以根据每一像素点的三维坐标信息在原始图像的索引值,将计算出的三维坐标视作不同的通道放回到图像中去,例如替代原始RGB通道。
这样以图像形式组织坐标转化后的像素信息,从而避免了引入点云数据,使得整个系统只存在图像一种数据表征形式,保持系统的简洁、高效。
步骤S1106,利用神经网络对三维信息图像进行检测,得到目标对象的检测结果。
这里,利用面向图像数据的深度学习技术进行三维目标检测,如估计三维物体的姿态。只需利用面向图像的深度学习技术估计目标的三维信息即可,本公开示例对采用何种神经网络不作限定。
本公开实施例利用深度估计方法得到待检测图像的深度图像,能够有效地补充二维图像中缺少的深度信息。本公开实施例引入了坐标系转换,通过相机的内部参数和估计出的深度图像建立了从图像坐标系到三维世界坐标系的一一对应的映射,消除了图像坐标系和三维世界坐标系之间的模糊性,能够大大地提高系统的检测性能。同时在进行数据坐标系转换的过程中,将生成的三维坐标点按照原始图像的坐标索引组织为图像表征形式,保持图像结构。以图像形式组织坐标转化后的像素信息,从而避免了引入点云数据,使得整个系统只存在图像一种数据表征形式,保持系统的简洁,高效。
本公开实施例相对于相关技术,只是具有以下有益效果:第一方面,精度高:与不使用坐标系转换(或使用坐标系转换,但不将转换后的数据组织为图像表征形式)的方法相比,本系统能够得到的检测性能更高;第二方面,模型训练/测试过程简洁:现有的其他方法将图像坐标系转化到三维坐标系后,将像素点视为点云数据,需要用不同结构的神经网络单独对后续步骤进行训练,本系统自始至终以图像形式使用数据,从而避免了数据形式的转换,使得系统整体的训练/测试过程更加简便;第三方面,支持端到端训练:以往的方法需要分阶段训练模型。在第一阶段使用面向二维图像的神经网络训练,在第二阶段使用面三维点云的神经网络训练。两个阶段无法交互,故而无法得到最优解。本系统可以整合两个部分,统一使用面向二维图像的神经网络训练,从而支持端到端训练。
在一些实施方式中,本公开实施例所提供的目标检测方法可以应用于基于图像数据的自动/辅助驾驶系统。在另一些实施方式中,本公开实施例所提供的目标检测方法可以应用于移动终端(例如手机)的AR(Augmented Reality,增强现实)系统和/或VR(Virtual Reality,虚拟现实)系统中,以实现AR系统和/或VR系统中的三维目标检测。
本领域技术人员可以理解,在具体实施方式的上述方法中,各步骤的撰写顺序并不意味着严格的执行顺序而对实施过程构成任何限定,各步骤的实际执行顺序应当以其功能和可能的内在逻辑确定。
基于同一技术构思,本公开实施例中还提供了与目标检测方法对应的目标检测装置,由于本公开实施例中的装置解决问题的原理与本公开实施例上述目标检测方法相似,因此装置的实施可以参见方法的实施。
参照图12所示,为本公开实施例提供的一种目标检测装置的示意图,该目标检测装置1200包括:
获取模块1201,配置为获取图像采集部件采集的图像,以及所述图像采集部件的内部参数;
确定模块1202,配置为基于采集到的图像和内部参数,确定采集到的图像中每个像素点在世界坐标系下的三维坐标信息;
生成模块1203,配置为根据采集到的图像和采集到的图像中每个像素点在世界坐标系下的三维坐标信息,生成与采集到的图像对应的三维信息图像;三维信息图像中的像素点的排序与采集到的图像中的像素点的排序相同;
检测模块1204,配置为基于三维信息图像,确定采集到的图像中包含的目标对象在世界坐标系下的三维检测信息。
在一种可能的实施方式中,目标检测装置1200还包括控制模块1205,图像采集部件位于目标车辆上,在确定采集到的图像中包含的目标对象的三维检测信息后,控制模块1205配置为:
基于每个目标对象的三维检测信息,确定每个所述目标对象与目标车辆之间的距离信息;
基于每个目标对象的三维位姿数据、距离信息、以及目标车辆的当前位姿数据,控制目标车辆行驶。
在一种可能的实施方式中,确定模块1202,配置为:
基于采集到的图像,生成采集到的图像对应的深度图像,深度图像中包含采集到的图像中的每个像素点对应的深度信息;
基于采集到的图像中每个像素点在图像坐标系下的二维坐标信息、每个所述像素点的深度信息以及内部参数,确定每个所述像素点在世界坐标系下的三维坐标信息。
在一种可能的实施方式中,生成模块1203配置为:
按照每个像素点对应的三维坐标信息,以及每个所述像素点在采集到的图像中的索引信息,生成三维信息图像;三维信息图像中的每个像素点的通道信息至少包含每个所述像素点在世界坐标系下的三维坐标信息。
在一种可能的实施方式中,生成模块1203配置为:
将每个像素点对应的三维坐标信息,作为每个所述像素点在三维信息图像中对应的多通道信息;
基于每个所述像素点在三维信息图像中对应的多通道信息,以及每个所述像素点在采集到的图像中的索引信息,生成三维信息图像。
在一种可能的实施方式中,生成模块1203配置为:
将每个像素点对应的三维坐标信息以及每个所述像素点在采集到的图像中的信息,作为每个所述像素点在三维信息图像中对应的多通道信息;
基于每个所述像素点在三维信息图像中对应的多通道信息,以及每个所述像素点在采集到的图像中的索引信息,生成三维信息图像。
在一种可能的实施方式中,检测模块1204配置为:
基于采集到的图像中包含的目标对象的二维检测信息,对三维信息图像进行裁剪,得到至少一个三维信息图像块,每个三维信息图像块中包含至少一个目标对象;
针对每个三维信息图像块进行特征提取,得到每个所述三维信息图像块对应的多个特征图像,多个特征图像中包含表征目标对象深度信息的深度特征图像;
基于每个三维信息图像对应的深度特征图像,对至少一个三维信息图像块进行分类,确定每种类别的三维信息图像块对应的三维目标检测网络;
针对每个三维信息图像块,按照每个所述三维信息图像块对应的三维目标检测网络以及每个所述三维信息图像块对应的多个特征图像,确定每个所述三维信息图像块中的目标对象在世界坐标系下的三维检测信息。
在一种可能的实施方式中,检测模块1204配置为:
针对每个三维信息图像块,按照设定的池化尺寸和池化步长,对每个所述三维信息图像块对应的每个特征图像进行最大池化处理,得到每个所述特征图像池化处理后对应的池化值;
将每个所述三维信息图像块的每个特征图像对应的池化值,组成每个所述三维信息图像块对应的目标检测特征向量;
基于每个所述三维信息图像块对应的目标检测特征向量,以及每个所述三维信息图像块对应的三维目标检测网络,确定每个所述三维信息图像块中的目标对象在世界坐标系下的三维检测信息。
在一种可能的实施方式中,目标检测装置1200还包括训练模块1206,训练模块1206配置为:
训练配置为检测三维检测信息的神经网络,神经网络利用了包含目标样本对象的标注三维检测信息的样本图像训练得到。
关于装置中的各模块的处理流程、以及各模块之间的交互流程的描述可以参照上述方法实施例中的相关说明,这里不再详述。
对应于图1中的目标检测方法,本公开实施例还提供了一种电子设备1300,如图13所示,为本公开实施例提供的电子设备的示意图,包括:
处理器131、存储器132、和总线133;存储器132配置为存储执行指令,包括内存1321和外部存储器1322;这里的内存1321也称内存储器,配置为暂时存放处理器131中的运算数据,以及与硬盘等外部存储器1322交换的数据,处理器131通过内存1321与外部存储器1322进行数据交换,当电子设备1300运行时,处理器131与存储器132之间通过总线133通信,使得处理器131执行以下指令:获取图像采集部件采集的图像,以及该图像采集部件的内部参数;基于采集到的图像和内部参数,确定采集到的图像中每个像素点在世界坐标系下的三维坐标信息;根据采集到的图像和采集到的图像中每个像素点在世界坐标系下的三维坐标信息,生成与采集到的图像对应的三维信息图像;三维信息图像中的像素点的排序与采集到的图像中的像素点的排序相同;基于三维信息图像,确定采集到的图像中包含的目标对象在世界坐标系下的三维检测信息。
本公开实施例还提供一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,该计算机程序被处理器运行时执行上述方法实施例中所述的目标检测方法的步骤。其中,该存储介质可以是易失性或非易失的计算机可读取存储介质。
本公开实施例所提供的目标检测方法的计算机程序产品,包括存储了程序代码的计算机可读存储介质,所述程序代码包括的指令可用于执行上述方法实施例中所述的目标检测方法的步骤,可参见上述方法实施例。
本公开实施例还提供一种计算机程序,该计算机程序被处理器执行时实现前述实施例的任意一种方法。该计算机程序产品可以通过硬件、软件或其结合的方式实现。在一个可选实施例中,所述计算机程序产品体现为计算机存储介质,在另一个可选实施例中,计算机程序产品体现为软件产品,例如软件开发包(Software Development Kit,SDK)等等。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统和装置的工作过程,可以参考前述方法实施例中的对应过程。在本公开所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,又例如,多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些通信接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本公开各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个处理器可执行的非易失的计算机可读取存储介质中。基于这样的理解,本公开的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本公开各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
最后应说明的是:以上所述实施例,仅为本公开的实施方式,用以说明本公开的技术方案,而非对其限制,本公开的保护范围并不局限于此,尽管参照前述实施例对本公开进行了详细的说明,本领域的普通技术人员应当理解:任何熟悉本技术领域的技术人员在本公开揭露的技术范围内,其依然可以对前述实施例所记载的技术方案进行修改或可轻易想到变化,或者对其中部分技术特征进行等同替换;而这些修改、变化或者替换,并不使相应技术方案的本质脱离本公开实施例技术方案的精神和范围,都应涵盖在本公开的保护范围之内。因此,本公开的保护范围应所述以权利要求的保护范围为准。
工业实用性
本公开实施例中,在图像采集部件采集到图像后,可以基于该采集到的图像得到图像结构相同,且增加每个像素点在世界坐标系下的三维坐标信息的三维信息图像,基于该三维信息图像可以完成针对目标对象的三维目标检测,图像采集部件相比雷达装置,具有便携性高、成本低的优点,且相比雷达装置采集的点云数据,图像采集部件在近距离区域内也能够获取到视野范围内完整的目标对象,包括体积较小的目标对象,因此能够准确地完成针对近距离区域的目标对象的三维目标检测。

Claims (20)

  1. 一种目标检测方法,包括:
    获取图像采集部件采集的图像,以及所述图像采集部件的内部参数;
    基于采集到的图像和所述内部参数,确定所述采集到的图像中每个像素点在世界坐标系下的三维坐标信息;
    根据所述采集到的图像和所述采集到的图像中每个像素点在世界坐标系下的三维坐标信息,生成与所述采集到的图像对应的三维信息图像;所述三维信息图像中的像素点的排序与所述采集到的图像中的像素点的排序相同;
    基于所述三维信息图像,确定所述采集到的图像中包含的目标对象在所述世界坐标系下的三维检测信息。
  2. 根据权利要求1所述的目标检测方法,其中,所述图像采集部件位于目标车辆上,在确定所述采集到的图像中包含的目标对象的三维检测信息后,所述目标检测方法还包括:
    基于每个目标对象的三维检测信息,确定每个所述目标对象与所述目标车辆之间的距离信息;
    基于每个所述目标对象的所述三维检测信息、所述距离信息、以及所述目标车辆的当前位姿数据,控制所述目标车辆行驶。
  3. 根据权利要求1或2所述的目标检测方法,其中,所述基于采集到的图像和所述内部参数,确定所述采集到的图像中每个像素点在世界坐标系下的三维坐标信息,包括:
    基于所述采集到的图像,生成所述采集到的图像对应的深度图像,所述深度图像中包含所述采集到的图像中的每个像素点的深度信息;
    基于所述采集到的图像中每个像素点在图像坐标系下的二维坐标信息、每个所述像素点的深度信息以及所述内部参数,确定每个所述像素点在所述世界坐标系下的三维坐标信息。
  4. 根据权利要求1至3任一所述的目标检测方法,其中,所述根据所述采集到的图像和所述采集到的图像中每个像素点在世界坐标系下的三维坐标信息,生成与所述采集到的图像对应的三维信息图像,包括:
    按照所述采集到的图像中每个像素点对应的三维坐标信息,以及每个所述像素点在所述采集到的图像中的索引信息,生成所述三维信息图像;所述三维信息图像中的每个像素点的通道信息至少包含每个所述像素点在世界坐标系下的三维坐标信息。
  5. 根据权利要求4所述的目标检测方法,其中,所述按照所述采集到的图像中每个像素点对应的三维坐标信息,以及每个所述像素点在所述采集到的图像中的索引信息,生成所述三维信息图像,包括:
    将所述采集到的图像中每个像素点对应的三维坐标信息,作为每个所述像素点在所述三维信息图像中对应的多通道信息;
    基于每个所述像素点在所述三维信息图像中对应的多通道信息,以及每个所述像素点在所述采集到的图像中的索引信息,生成所述三维信息图像。
  6. 根据权利要求4所述的目标检测方法,其中,所述按照所述采集到的图像中每个像素点对应的三维坐标信息,以及每个所述像素点在所述采集到的图像中的索引信息,生成所述三维信息图像,包括:
    将所述采集到的图像中每个像素点对应的三维坐标信息以及每个所述像素点在所述采集到的图像中的信息,作为每个所述像素点在所述三维信息图像中对应的多通道信息;
    基于每个所述像素点在所述三维信息图像中对应的多通道信息,以及每个所述像素点在所述采集到的图像中的索引信息,生成所述三维信息图像。
  7. 根据权利要求1至6任一所述的目标检测方法,其中,所述基于所述三维信息图像,确定所述采集到的图像中包含的目标对象在所述世界坐标系下的三维检测信息,包括:
    基于所述采集到的图像中包含的目标对象的二维检测信息,对所述三维信息图像进行裁剪,得到至少一个三维信息图像块;其中,每个所述三维信息图像块中包含至少一个目标对象;
    针对每个所述三维信息图像块进行特征提取,得到每个所述三维信息图像块对应的多个特征图像,所述多个特征图像中包含表征每个所述目标对象深度信息的深度特征图像;
    基于每个所述三维信息图像块对应的深度特征图像,对所述至少一个三维信息图像块进行分类, 确定每种类别的三维信息图像块对应的三维目标检测网络;
    针对每个所述三维信息图像块,按照每个所述三维信息图像块对应的三维目标检测网络以及每个所述三维信息图像块对应的所述多个特征图像,确定每个所述三维信息图像块中的目标对象在所述世界坐标系下的三维检测信息。
  8. 根据权利要求7所述的目标检测方法,其中,所述针对每个三维信息图像块,按照每个所述三维信息图像块对应的三维目标检测网络以及每个所述三维信息图像块对应的所述多个特征图像,确定每个所述三维信息图像块中的目标对象在所述世界坐标系下的三维检测信息,包括:
    针对每个三维信息图像块,按照设定的池化尺寸和池化步长,对每个所述三维信息图像块对应的每个特征图像进行最大池化处理,得到每个所述特征图像池化处理后对应的池化值;
    将每个所述三维信息图像块的每个特征图像对应的池化值,组成每个所述三维信息图像块对应的目标检测特征向量;
    基于每个所述三维信息图像块对应的目标检测特征向量,以及每个所述三维信息图像块对应的三维目标检测网络,确定每个所述三维信息图像块中的目标对象在所述世界坐标系下的三维检测信息。
  9. 根据权利要求1至8任一所述的目标检测方法,其中,所述三维检测信息由神经网络检测得到,所述神经网络利用了包含目标样本对象的标注三维检测信息的样本图像训练得到。
  10. 一种目标检测装置,包括:
    获取模块,配置为获取图像采集部件采集的图像,以及所述图像采集部件的内部参数;
    确定模块,配置为基于采集到的图像和所述内部参数,确定所述采集到的图像中每个像素点在世界坐标系下的三维坐标信息;
    生成模块,配置为根据所述采集到的图像和所述采集到的图像中每个像素点在世界坐标系下的三维坐标信息,生成与所述采集到的图像对应的三维信息图像;所述三维信息图像中的像素点的排序与所述采集到的图像中的像素点的排序相同;
    检测模块,配置为基于所述三维信息图像,确定所述采集到的图像中包含的目标对象在所述世界坐标系下的三维检测信息。
  11. 根据权利要求10所述的目标检测装置,其中,所述目标检测装置还包括控制模块,所述图像采集部件位于目标车辆上,在确定所述采集到的图像中包含的目标对象的三维检测信息后,所述控制模块配置为:
    基于每个目标对象的三维检测信息,确定每个所述目标对象与所述目标车辆之间的距离信息;
    基于每个所述目标对象的所述三维检测信息、所述距离信息、以及所述目标车辆的当前位姿数据,控制所述目标车辆行驶。
  12. 根据权利要求10或11所述的目标检测装置,其中,所述确定模块配置为:
    基于所述采集到的图像,生成所述采集到的图像对应的深度图像,所述深度图像中包含所述采集到的图像中的每个像素点的深度信息;
    基于所述采集到的图像中每个像素点在图像坐标系下的二维坐标信息、每个所述像素点的深度信息以及所述内部参数,确定每个所述像素点在所述世界坐标系下的三维坐标信息。
  13. 根据权利要求10至12任一所述的目标检测装置,其中,所述生成模块配置为:
    按照所述采集到的图像中每个像素点对应的三维坐标信息,以及每个所述像素点在所述采集到的图像中的索引信息,生成所述三维信息图像;所述三维信息图像中的每个像素点的通道信息至少包含每个所述像素点在世界坐标系下的三维坐标信息。
  14. 根据权利要求13所述的目标检测装置,其中,所述生成模块配置为:
    将所述采集到的图像中每个像素点对应的三维坐标信息,作为每个所述像素点在所述三维信息图像中对应的多通道信息;
    基于每个所述像素点在所述三维信息图像中对应的多通道信息,以及每个所述像素点在所述采集到的图像中的索引信息,生成所述三维信息图像。
  15. 根据权利要求13所述的目标检测装置,其中,所述生成模块配置为:
    将所述采集到的图像中每个像素点对应的三维坐标信息以及每个所述像素点在所述采集到的图像中的信息,作为每个所述像素点在所述三维信息图像中对应的多通道信息;
    基于每个所述像素点在所述三维信息图像中对应的多通道信息,以及每个所述像素点在所述采集到的图像中的索引信息,生成所述三维信息图像。
  16. 根据权利要求10至15任一所述的目标检测装置,其中,所述检测模块配置为:
    基于所述采集到的图像中包含的目标对象的二维检测信息,对所述三维信息图像进行裁剪,得到至少一个三维信息图像块;其中,每个所述三维信息图像块中包含至少一个目标对象;
    针对每个所述三维信息图像块进行特征提取,得到每个所述三维信息图像块对应的多个特征图像,所述多个特征图像中包含表征每个所述目标对象深度信息的深度特征图像;
    基于每个所述三维信息图像块对应的深度特征图像,对所述至少一个三维信息图像块进行分类,确定每种类别的三维信息图像块对应的三维目标检测网络;
    针对每个所述三维信息图像块,按照每个所述三维信息图像块对应的三维目标检测网络以及每个所述三维信息图像块对应的所述多个特征图像,确定每个所述三维信息图像块中的目标对象在所述世界坐标系下的三维检测信息。
  17. 根据权利要求16所述的目标检测装置,其中,所述检测模块配置为:
    针对每个三维信息图像块,按照设定的池化尺寸和池化步长,对每个所述三维信息图像块对应的每个特征图像进行最大池化处理,得到每个所述特征图像池化处理后对应的池化值;
    将每个所述三维信息图像块的每个特征图像对应的池化值,组成每个所述三维信息图像块对应的目标检测特征向量;
    基于每个所述三维信息图像块对应的目标检测特征向量,以及每个所述三维信息图像块对应的三维目标检测网络,确定每个所述三维信息图像块中的目标对象在所述世界坐标系下的三维检测信息。
  18. 一种电子设备,包括:处理器、存储器和总线,所述存储器存储有所述处理器可执行的机器可读指令,当电子设备运行时,所述处理器与所述存储器之间通过总线通信,所述机器可读指令被所述处理器执行时执行如权利要求1至9任一所述的目标检测方法的步骤。
  19. 一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器运行时执行如权利要求1至9任一所述的目标检测方法的步骤。
  20. 一种计算机程序产品,包括计算机可读代码,在所述计算机可读代码在电子设备中运行的情况下,所述电子设备中的处理器执行如权利要求1至9任一项所述的目标检测方法的步骤。
PCT/CN2021/090359 2020-08-08 2021-04-27 目标检测方法、装置、设备、存储介质及程序产品 WO2022033076A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
KR1020217042833A KR20220024193A (ko) 2020-08-08 2021-04-27 타깃 검출 방법, 장치, 기기, 저장 매체 및 프로그램 제품

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010792241.X 2020-08-08
CN202010792241.XA CN111931643A (zh) 2020-08-08 2020-08-08 一种目标检测方法、装置、电子设备及存储介质

Publications (1)

Publication Number Publication Date
WO2022033076A1 true WO2022033076A1 (zh) 2022-02-17

Family

ID=73308121

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/090359 WO2022033076A1 (zh) 2020-08-08 2021-04-27 目标检测方法、装置、设备、存储介质及程序产品

Country Status (3)

Country Link
KR (1) KR20220024193A (zh)
CN (1) CN111931643A (zh)
WO (1) WO2022033076A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114655207A (zh) * 2022-05-13 2022-06-24 中汽创智科技有限公司 一种数据处理方法、装置、设备及存储介质
CN115100423A (zh) * 2022-06-17 2022-09-23 四川省寰宇众恒科技有限公司 一种基于视图采集数据实现实时定位系统及方法
CN115115687A (zh) * 2022-06-24 2022-09-27 合众新能源汽车有限公司 车道线测量方法及装置
CN117308967A (zh) * 2023-11-30 2023-12-29 中船(北京)智能装备科技有限公司 一种目标对象位置信息的确定方法、装置及设备

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111931643A (zh) * 2020-08-08 2020-11-13 商汤集团有限公司 一种目标检测方法、装置、电子设备及存储介质
CN112926395A (zh) * 2021-01-27 2021-06-08 上海商汤临港智能科技有限公司 目标检测方法、装置、计算机设备及存储介质
CN112907757A (zh) * 2021-04-08 2021-06-04 深圳市慧鲤科技有限公司 一种导航提示方法、装置、电子设备及存储介质
KR102591835B1 (ko) * 2021-08-13 2023-10-24 한국전자통신연구원 딥러닝 기반 의상 속성 분류 장치 및 방법
CN115035492B (zh) * 2022-06-21 2024-01-23 苏州浪潮智能科技有限公司 车辆识别方法、装置、设备和存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120219183A1 (en) * 2011-02-24 2012-08-30 Daishi Mori 3D Object Detecting Apparatus and 3D Object Detecting Method
CN106875444A (zh) * 2017-01-19 2017-06-20 浙江大华技术股份有限公司 一种目标物定位方法及装置
CN111274943A (zh) * 2020-01-19 2020-06-12 深圳市商汤科技有限公司 一种检测方法、装置、电子设备及存储介质
CN111382613A (zh) * 2018-12-28 2020-07-07 中国移动通信集团辽宁有限公司 图像处理方法、装置、设备和介质
CN111931643A (zh) * 2020-08-08 2020-11-13 商汤集团有限公司 一种目标检测方法、装置、电子设备及存储介质

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10304191B1 (en) * 2016-10-11 2019-05-28 Zoox, Inc. Three dimensional bounding box estimation from two dimensional images
CN110826357B (zh) * 2018-08-07 2022-07-26 北京市商汤科技开发有限公司 对象三维检测及智能驾驶控制的方法、装置、介质及设备
CN109671102B (zh) * 2018-12-03 2021-02-05 华中科技大学 一种基于深度特征融合卷积神经网络的综合式目标跟踪方法
CN109784194B (zh) * 2018-12-20 2021-11-23 北京图森智途科技有限公司 目标检测网络构建方法和训练方法、目标检测方法
CN109961522B (zh) * 2019-04-02 2023-05-05 阿波罗智联(北京)科技有限公司 图像投射方法、装置、设备和存储介质
CN110427797B (zh) * 2019-05-28 2023-09-15 东南大学 一种基于几何条件限制的三维车辆检测方法
CN110689008A (zh) * 2019-09-17 2020-01-14 大连理工大学 一种面向单目图像的基于三维重建的三维物体检测方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120219183A1 (en) * 2011-02-24 2012-08-30 Daishi Mori 3D Object Detecting Apparatus and 3D Object Detecting Method
CN106875444A (zh) * 2017-01-19 2017-06-20 浙江大华技术股份有限公司 一种目标物定位方法及装置
CN111382613A (zh) * 2018-12-28 2020-07-07 中国移动通信集团辽宁有限公司 图像处理方法、装置、设备和介质
CN111274943A (zh) * 2020-01-19 2020-06-12 深圳市商汤科技有限公司 一种检测方法、装置、电子设备及存储介质
CN111931643A (zh) * 2020-08-08 2020-11-13 商汤集团有限公司 一种目标检测方法、装置、电子设备及存储介质

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114655207A (zh) * 2022-05-13 2022-06-24 中汽创智科技有限公司 一种数据处理方法、装置、设备及存储介质
CN115100423A (zh) * 2022-06-17 2022-09-23 四川省寰宇众恒科技有限公司 一种基于视图采集数据实现实时定位系统及方法
CN115100423B (zh) * 2022-06-17 2023-10-10 四川省寰宇众恒科技有限公司 一种基于视图采集数据实现实时定位系统及方法
CN115115687A (zh) * 2022-06-24 2022-09-27 合众新能源汽车有限公司 车道线测量方法及装置
CN117308967A (zh) * 2023-11-30 2023-12-29 中船(北京)智能装备科技有限公司 一种目标对象位置信息的确定方法、装置及设备
CN117308967B (zh) * 2023-11-30 2024-02-02 中船(北京)智能装备科技有限公司 一种目标对象位置信息的确定方法、装置及设备

Also Published As

Publication number Publication date
KR20220024193A (ko) 2022-03-03
CN111931643A (zh) 2020-11-13

Similar Documents

Publication Publication Date Title
WO2022033076A1 (zh) 目标检测方法、装置、设备、存储介质及程序产品
CN107742311B (zh) 一种视觉定位的方法及装置
CN111328396B (zh) 用于图像中的对象的姿态估计和模型检索
CN112287860B (zh) 物体识别模型的训练方法及装置、物体识别方法及系统
CN109903331B (zh) 一种基于rgb-d相机的卷积神经网络目标检测方法
CN109145928B (zh) 一种基于图像的车头朝向识别方法及装置
CN111080693A (zh) 一种基于YOLOv3的机器人自主分类抓取方法
CN111340797A (zh) 一种激光雷达与双目相机数据融合检测方法及系统
CN112528878A (zh) 检测车道线的方法、装置、终端设备及可读存储介质
CN111553949B (zh) 基于单帧rgb-d图像深度学习对不规则工件的定位抓取方法
CN106971185B (zh) 一种基于全卷积网络的车牌定位方法及装置
CN110751097B (zh) 一种半监督的三维点云手势关键点检测方法
CN112287859A (zh) 物体识别方法、装置和系统,计算机可读存储介质
CN111144349A (zh) 一种室内视觉重定位方法及系统
CN111382658B (zh) 一种基于图像灰度梯度一致性的自然环境下道路交通标志检测方法
CN112395962A (zh) 数据增广方法及装置、物体识别方法及系统
CN114219855A (zh) 点云法向量的估计方法、装置、计算机设备和存储介质
CN115272691A (zh) 一种钢筋绑扎状态检测模型的训练方法、识别方法及设备
JP7336653B2 (ja) ディープラーニングを利用した屋内位置測位方法
CN113658274B (zh) 用于灵长类动物种群行为分析的个体间距自动计算方法
CN116051736A (zh) 一种三维重建方法、装置、边缘设备和存储介质
CN116052120A (zh) 基于图像增强和多传感器融合的挖掘机夜间物体检测方法
CN115240150A (zh) 基于单目相机的车道偏离预警方法、系统、设备及介质
CN117576494A (zh) 特征地图生成方法、装置、存储介质和计算机设备
CN113052118A (zh) 基于高速快球摄像机实现场景变换视频分析检测的方法、系统、装置、处理器及存储介质

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2021565778

Country of ref document: JP

Kind code of ref document: A

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21855132

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 11.07.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 21855132

Country of ref document: EP

Kind code of ref document: A1