WO2021143935A1 - 一种检测方法、装置、电子设备及存储介质 - Google Patents
一种检测方法、装置、电子设备及存储介质 Download PDFInfo
- Publication number
- WO2021143935A1 WO2021143935A1 PCT/CN2021/072750 CN2021072750W WO2021143935A1 WO 2021143935 A1 WO2021143935 A1 WO 2021143935A1 CN 2021072750 W CN2021072750 W CN 2021072750W WO 2021143935 A1 WO2021143935 A1 WO 2021143935A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- detected
- information
- dimensional
- dimensional image
- structured polygon
- Prior art date
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 107
- 240000004050 Pentaglottis sempervirens Species 0.000 claims description 47
- 235000004522 Pentaglottis sempervirens Nutrition 0.000 claims description 47
- 238000000605 extraction Methods 0.000 claims description 21
- 238000013528 artificial neural network Methods 0.000 claims description 10
- 238000004590 computer program Methods 0.000 claims description 10
- 238000003384 imaging method Methods 0.000 claims description 8
- 238000013527 convolutional neural network Methods 0.000 claims description 7
- 230000036544 posture Effects 0.000 claims description 5
- 238000010276 construction Methods 0.000 claims description 3
- 238000000034 method Methods 0.000 abstract description 29
- 238000010586 diagram Methods 0.000 description 12
- 238000013136 deep learning model Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 238000005457 optimization Methods 0.000 description 4
- 238000004891 communication Methods 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 238000007689 inspection Methods 0.000 description 2
- 238000003062 neural network model Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/64—Three-dimensional objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T15/00—3D [Three Dimensional] image rendering
- G06T15/10—Geometric effects
- G06T15/20—Perspective computation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/50—Depth or shape recovery
- G06T7/543—Depth or shape recovery from line drawings
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/24—Aligning, centring, orientation detection or correction of the image
- G06V10/247—Aligning, centring, orientation detection or correction of the image by affine transforms, e.g. correction due to perspective effects; Quadrilaterals, e.g. trapezoids
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/42—Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/42—Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation
- G06V10/422—Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation for representing the structure of the pattern or shape of an object therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/10—Terrestrial scenes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/56—Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/64—Three-dimensional objects
- G06V20/647—Three-dimensional objects by matching two-dimensional images to three-dimensional objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2210/00—Indexing scheme for image generation or computer graphics
- G06T2210/12—Bounding box
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/20—Scenes; Scene-specific elements in augmented reality scenes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
Definitions
- the present disclosure relates to the field of image processing technology, and in particular, to a detection method, device, electronic equipment, and storage medium.
- 3D target detection In the field of computer vision, three-division (3D) target detection is one of the most basic tasks. 3D target detection can be applied to scenes such as autonomous driving and robot execution tasks.
- the present disclosure provides at least one detection method, device, electronic equipment, and storage medium.
- the present disclosure provides a detection method, including: acquiring a two-dimensional image; based on the acquired two-dimensional image, constructing a structured polygon corresponding to at least one object to be detected in the two-dimensional image, wherein ,
- the structured polygon corresponding to each object to be detected represents the projection of the three-dimensional bounding box corresponding to the object to be detected on the two-dimensional image; for each object to be detected, based on the The height information and the height information of the vertical sides in the structured polygon corresponding to the object to be detected, calculate the depth information of the vertices in the structured polygon; based on the depth information of the vertices in the structured polygon, and the height information
- the two-dimensional coordinate information of the vertices of the structured polygon in the two-dimensional image determines the three-dimensional space information of the object to be detected, and the three-dimensional space information of the object to be detected corresponds to the object to be detected.
- the three-dimensional bounding box is related.
- the constructed structured polygon is the projection of the three-dimensional bounding box corresponding to the object to be detected in the two-dimensional image, the constructed structured polygon can better characterize the three-dimensional characteristics of the object to be detected.
- Depth information predicted based on structured polygons has higher accuracy than depth information predicted directly based on two-dimensional image features. Furthermore, the accuracy of the obtained three-dimensional spatial information of the object to be detected is relatively high, which improves the accuracy of the 3D detection result.
- the present disclosure provides a detection device.
- the detection device includes: an image acquisition module for acquiring a two-dimensional image; a structured polygon building module for building the At least one object to be detected in the two-dimensional image corresponds to a structured polygon, wherein the structured polygon corresponding to each object to be detected represents that the three-dimensional bounding box corresponding to the object to be detected is on the two-dimensional image
- a three-dimensional spatial information determination module for determining the depth information of the vertices in the structured polygon and the two-dimensional coordinates of the vertices of the structured polygon in the two-dimensional image Information, determining the three-dimensional space information of the object to be detected, and the three-dimensional space information of the object to be detected is related
- the present disclosure provides an electronic device including: a processor; a memory storing machine-readable instructions executable by the processor; and a bus.
- the processing The device and the memory communicate through the bus; when the machine-readable instructions are executed by the processor, the steps of the detection method according to the first aspect or any one of the embodiments are executed.
- the present disclosure provides a computer-readable storage medium having a computer program stored on the computer-readable storage medium, and the computer program executes the detection described in the first aspect or any one of the embodiments when the computer program is run by a processor. Method steps.
- FIG. 1 shows a schematic flowchart of a detection method provided by an embodiment of the present disclosure
- Figure 2a shows a schematic structural diagram of a structured polygon corresponding to an object to be detected in a detection method provided by an embodiment of the present disclosure
- FIG. 2b shows a schematic diagram of the structure of a three-dimensional bounding box corresponding to an object to be detected in a detection method provided by an embodiment of the present disclosure, and the projection of the three-dimensional bounding box on the two-dimensional image is the structured polygon in FIG. 2a;
- FIG. 3 shows a schematic flowchart of a method for constructing a structured polygon corresponding to an object to be detected in a detection method provided by an embodiment of the present disclosure
- FIG. 4 shows a schematic flowchart of a method for determining attribute information of a structured polygon corresponding to an object to be detected in a detection method provided by an embodiment of the present disclosure
- FIG. 5 shows a schematic flowchart of a method for feature extraction of a target image corresponding to an object to be detected in a detection method provided by an embodiment of the present disclosure
- FIG. 6 shows a schematic structural diagram of a feature extraction model in a detection method provided by an embodiment of the present disclosure
- FIG. 7 shows a structural diagram of the corresponding relationship between the structured polygon corresponding to the object to be detected determined based on the two-dimensional image and the three-dimensional bounding box corresponding to the object to be detected in a detection method provided by an embodiment of the present disclosure ;
- FIG. 8 shows a top view of an image to be detected in a detection method provided by an embodiment of the present disclosure
- FIG. 9 shows a schematic flowchart of a method for obtaining adjusted three-dimensional space information of an object to be detected in a detection method provided by an embodiment of the present disclosure
- FIG. 10 shows a schematic structural diagram of an image detection model in a detection method provided by an embodiment of the present disclosure
- FIG. 11 shows a schematic structural diagram of a detection device provided by an embodiment of the present disclosure
- FIG. 12 shows a schematic structural diagram of an electronic device provided by an embodiment of the present disclosure.
- two-dimensional images are generally captured by camera equipment, and the target objects in front of the vehicle or robot are identified based on the two-dimensional images, such as identifying obstacles in front, so that the vehicle or robot can detect obstacles. Take avoidance. Since only the size of the target object in the plane dimension can be identified from the two-dimensional image, it is impossible to accurately grasp the three-dimensional spatial information of the target object in the real world, which leads to the possibility of performing tasks such as automatic driving and robot transportation based on the recognition result. There will be some dangerous situations, such as crashes, hitting obstacles, etc.
- embodiments of the present disclosure provide a detection method, which obtains structured polygons and depth information corresponding to the object to be detected based on a two-dimensional image to achieve 3D target detection.
- a structured polygon is constructed for each object to be detected involved in the acquired two-dimensional image. Since the constructed structured polygon is the projection of the three-dimensional bounding box corresponding to the object to be detected in the two-dimensional image, the constructed structured polygon can better characterize the three-dimensional characteristics of the object to be detected.
- the depth information of the vertices in the structured polygon is calculated based on the height information of the object to be detected and the height information of the vertical sides in the structured polygon corresponding to the object to be detected. This kind of depth information based on structured polygon prediction has higher accuracy than depth information based on direct prediction of two-dimensional image features.
- the accuracy of the obtained three-dimensional information is relatively high. In turn, the accuracy of the 3D target detection result is improved.
- the detection method provided by the embodiments of the present disclosure can be applied to a server or a smart terminal device with a central processing unit.
- the server may be a local server or a cloud server, etc.
- the smart terminal device may be a smart phone, a tablet computer, a personal digital assistant (PDA), etc., which is not limited in the present disclosure.
- the detection method provided by the present disclosure can be applied to any scene where the object to be detected needs to be sensed.
- the detection method can be applied in an automatic driving scene, or in a scene where a robot performs a task.
- the camera device installed on the vehicle acquires a two-dimensional image of the vehicle during driving, and sends the acquired two-dimensional image to the server for 3D target detection, or the acquired two
- the three-dimensional image is sent to the smart terminal device.
- the server or smart terminal device processes the two-dimensional image based on the detection method provided by the embodiment of the present disclosure, and determines the three-dimensional space information of each object to be detected in the two-dimensional image.
- FIG. 1 it is a schematic flowchart of a detection method provided by an embodiment of the present disclosure, and the detection method is applied to a server as an example for description.
- the detection method includes the following steps S101-S104.
- a two-dimensional image is acquired.
- the two-dimensional image relates to at least one object to be detected.
- a structured polygon corresponding to at least one object to be detected in the two-dimensional image is constructed based on the acquired two-dimensional image.
- a structured polygon corresponding to an object to be detected represents the projection of a three-dimensional bounding box corresponding to the object to be detected on the two-dimensional image.
- the depth information of the vertices in the structured polygon is calculated based on the height information of the object to be detected and the height information of the vertical sides in the structured polygon corresponding to the object to be detected.
- the three-dimensional space information of the object to be detected is determined, and the three-dimensional space information of the object to be detected is compared with The three-dimensional bounding box corresponding to the object to be detected is related.
- the server or the smart terminal device can obtain the two-dimensional image captured by the camera device in real time, or obtain the two-dimensional image within the preset shooting period from the storage module storing the two-dimensional image.
- the two-dimensional image may be a red-green-blue (RGB) image obtained by a camera device.
- two-dimensional images corresponding to the current position of the vehicle or robot can be acquired in real time during vehicle driving or robot transportation, and the acquired two-dimensional images can be processed.
- the structured polygon 24 corresponding to the object to be detected is a projection of a three-dimensional bounding box 25 of a rectangular parallelepiped structure on a two-dimensional image.
- the object to be detected may be any object that needs to be detected during the driving of the vehicle.
- the object to be detected may be a vehicle, an animal, a pedestrian, etc.
- constructing a structured polygon corresponding to at least one object to be detected in the two-dimensional image includes the following steps S301-S302.
- the attribute information of the structured polygon corresponding to each object to be detected is determined.
- the attribute information includes at least one of the following: vertex information, surface information, and contour line information.
- a structured polygon corresponding to each object to be detected is constructed.
- the attribute information includes vertex information
- multiple vertex information of the structured polygon corresponding to each object to be detected can be determined based on the two-dimensional image, and each vertex information to be detected can be constructed through the obtained multiple vertex information.
- the structured polygon corresponding to the object can be the coordinate information of the eight vertices of the structured polygon 24, namely the vertices P 1 , P 2 , P 3 , P 4 , P 5 , P 6 , P 7 , P The coordinate information of each vertex in 8.
- the multiple vertex information may also be the coordinate information of some vertices in the structured polygon 24, and a structured polygon can be uniquely determined based on the coordinate information of this portion of the vertices.
- the coordinate information of some vertices may be the coordinate information of each of the vertices P 3 , P 4 , P 5 , P 6 , P 7 , and P 8 , or the coordinate information of some vertices may also be the vertices P 3 , P 6, P 7, P 8 each vertex coordinate information.
- Which partial vertices are used to uniquely determine a structured polygon can be determined according to actual conditions, and the embodiment of the present disclosure does not specifically limit this.
- the plane information of multiple surfaces of the structured polygon corresponding to each object to be detected can be determined based on the two-dimensional image, and the plane information of each structured polygon corresponding to each object to be detected can be determined.
- a structured polygon corresponding to the object to be detected may be the shapes and positions of the six surfaces of the structured polygon 24.
- the multiple plane information may also be the shape and position of a part of the surface of the structured polygon 24, and a structured polygon can be uniquely determined based on the shape and position of this part of the surface.
- part of the surface may be the first plane 21, the second plane 22, and the third plane 23, or the part of the surface may also be the first plane 21 and the second plane 22.
- which partial planes are specifically used to uniquely determine a structured polygon can be determined according to actual conditions, and the embodiment of the present disclosure does not specifically limit this.
- multiple contour line information of the structured polygon corresponding to each object to be detected may be determined based on the two-dimensional image, and the obtained multiple contour line information can be used to construct The structured polygon corresponding to each object to be detected.
- multiple pieces of contour line information may be the positions and lengths of 12 contour lines of the structured polygon 24.
- multiple pieces of contour line information may also be the position and length of a part of the contour line in the structured polygon 24, and a structured polygon can be uniquely determined based on the position and length of this part of the contour line.
- the partial contour line may be a contour line composed of vertex P 7 and vertex P 8 (first contour line), a contour line composed of vertex P 7 and vertex P 3 (second contour line), and vertex P 7 and vertex P 6 constitutes the contour line (the third contour line), or part of the contour line may be the contour line formed by the vertex P 7 and the vertex P 8 (first contour line), the contour line formed by the vertex P 7 and the vertex P 3 (the first contour line) (2 contour lines), contour lines formed by vertex P 7 and vertex P 6 (third contour lines), and contour lines formed by vertex P 4 and vertex P 8 (fourth contour lines).
- Which contour lines are specifically used to uniquely determine a structured polygon can be determined according to actual conditions, and the embodiment of the present disclosure does not specifically limit this.
- vertex information (structured polygons generally include multiple vertices), plane information (structured polygons generally include multiple surfaces), and contour information (structured polygons generally include multiple contours) are what constitutes a structured polygon Basic information, based on these basic information, a structured polygon can be uniquely constructed, and the shape of the object to be detected can be more accurately represented.
- determining the attribute information of the structured polygon corresponding to each object to be detected includes the following steps S401-S403.
- S401 Perform object detection on the two-dimensional image to obtain at least one object area in the two-dimensional image. Among them, each object area contains an object to be detected.
- S402 Based on the object area corresponding to each object to be detected and the second preset size information, intercept a target image corresponding to each object to be detected from the two-dimensional image. Wherein, the second preset size information indicates that the size of the object area of each object to be detected is greater than or equal to.
- S403 Perform feature extraction on the target image corresponding to each object to be detected, to obtain attribute information of the structured polygon corresponding to each object to be detected.
- object detection can be performed on the two-dimensional image through the trained first neural network model, and the first detection frame (the area in the first detection frame) corresponding to each object to be detected in the two-dimensional image can be obtained. That is the target area).
- each object area contains an object to be detected.
- the size of the target image corresponding to each object to be detected can be made consistent, so the second preset size can be set. In this way, by intercepting the target image corresponding to each object to be detected from the two-dimensional image, the size of the target image corresponding to each object to be detected can be the same as the second preset size.
- the second preset size information may be determined based on historical experience. For example, based on the size of each object area in the historical experience, the largest size among the sizes corresponding to the multiple object areas may be selected as the second preset size. In this way, the second preset size can be set to be greater than or equal to the size of each object area, thereby making the input of the model for feature extraction of the target image consistent, and ensuring that the features of the object to be detected contained in each object area are complete . In other words, it can be avoided that when the second preset size is smaller than the size of any object area, the feature of the object to be detected involved in the object area is omitted.
- the target image ImgA corresponding to the object A to be detected is obtained based on the second preset size, and the target image ImgA of the object A to be detected contained in the target image ImgA is The feature is incomplete, which in turn makes the obtained attribute information of the structured polygon corresponding to the object A to be detected inaccurate.
- the center point of each object area may be used as the center point of the target image
- the second preset size may be used as the size
- the target image corresponding to each object to be detected may be intercepted from the two-dimensional image.
- the feature extraction of the target image corresponding to each object to be detected can be performed through the trained structure detection model to obtain the attribute information of the structured polygon corresponding to each object to be detected.
- the structure detection model can be obtained based on basic deep learning model training.
- the vertex determination model is obtained by training the basic deep learning model, and the target image corresponding to each object to be detected is input to the trained vertex determination model to obtain the corresponding object to be detected
- the structure inspection model includes a plane determination model
- the plane determination model is obtained by training the basic deep learning model, and the target image corresponding to each object to be inspected is input to the trained plane determination model to obtain the corresponding object to be inspected
- Information about all planes or information about part of the plane, the plane information includes at least one of plane position, plane shape, and plane size.
- the contour line determination model is obtained by training the basic deep learning model, and the target image corresponding to each object to be inspected is input to the trained contour line determination model to obtain the contour line determination model.
- the information of all contour lines or part of contour lines corresponding to the detection object, and the contour line information includes the position and length of the contour line.
- the target image corresponding to each object to be detected is first intercepted from the two-dimensional image, and then the feature extraction of the target image corresponding to each object to be detected is performed to obtain the structured polygon corresponding to each object to be detected.
- Property information processing the target image corresponding to each object to be detected into a uniform size can simplify the processing of the model used for feature extraction of the target image and improve the processing efficiency.
- the feature extraction of the target image corresponding to each object to be detected can be performed according to the following steps S501 to S503 to obtain the corresponding object of each object to be detected.
- the attribute information of the structured polygon is exemplary, referring to FIG. 5, when the attribute information includes vertex information, the feature extraction of the target image corresponding to each object to be detected can be performed according to the following steps S501 to S503 to obtain the corresponding object of each object to be detected.
- S501 Extract feature data of the target image corresponding to the object to be detected based on the convolutional neural network.
- S502 Process the characteristic data based on the stacked at least one hourglass network to obtain a heat atlas corresponding to the object to be detected.
- the heat map set includes a plurality of heat maps, and each heat map includes one of the vertices of the structured polygon corresponding to the object to be detected.
- S503 Determine the attribute information of the structured polygon corresponding to the object to be detected based on the heat atlas of the object to be detected.
- the target image corresponding to each object to be detected can be processed through the trained feature extraction model to determine the attribute information of the structured polygon corresponding to each object to be detected.
- the feature extraction model may include a convolutional neural network and at least one stacked hourglass network, and the number of the stacked at least one hourglass network can be determined according to actual needs.
- FIG. 6 it includes a target image 601, a convolutional neural network 602, and two stacked hourglass networks 603.
- the target image 601 corresponding to the object to be detected For each object to be detected, input the target image 601 corresponding to the object to be detected into the convolutional neural network 602 for feature extraction, and determine the feature data corresponding to the target image 601; input the feature data corresponding to the target image 601 into the stacked two Processing is performed in an hourglass network 603 to obtain the heat atlas corresponding to the object to be detected. In this way, the attribute information of the structured polygon corresponding to the object to be detected can be determined based on the heat atlas corresponding to the object to be detected.
- a heat map set includes a plurality of heat maps, and each feature point in each heat map corresponds to a probability value, and the probability value is the probability that the feature point is a vertex.
- the feature point with the largest probability value can be selected from the heat map as one of the multiple vertices of the structured polygon corresponding to the heat map set to which the heat map belongs.
- each heat map corresponds to a different vertex position, and the number of multiple heat maps included in a heat map set can be set according to actual needs.
- the heat map set can be set to include eight heat maps.
- the first heat map may include the vertices P 1 of the structured polygon in FIG. 2a
- the second heat map may include the vertices P 2 of the structured polygon in FIG. 2a
- the eighth heat map may include the vertices of the structured polygon in FIG. 2a P 8 .
- the attribute information contains the coordinate information of part of the vertices of the structured polygon, for example, part of the vertices P 3 , P 4 , P 5 , P 6 , P 7 , P 8 .
- you can set the heat map set to include six heat maps the first The heat map can include the vertices P 3 of the structured polygon in Figure 2a
- the second heat map can include the vertices P 4 of the structured polygon in Figure 2a
- the sixth heat map can include the vertices P 8 of the structured polygon in Figure 2a .
- determining the attribute information of the structured polygon corresponding to the object to be detected includes: performing feature extraction on the two-dimensional image to obtain the information of multiple target elements in the two-dimensional image. Including at least one of vertices, surfaces, and contour lines; clustering each of the target elements based on the information of the multiple target elements to obtain at least one clustered target element set; Target element set: a structured polygon is formed according to the target elements in the target element set, and the information of the target element in the target element set is used as the attribute information of the structured polygon.
- feature extraction can also be performed on the two-dimensional image, and the attribute information of the structured polygon corresponding to each object to be detected in the two-dimensional image can be determined. For example, when the target element is a vertex, if the two-dimensional image includes two objects to be detected, that is, the first object to be detected and the second object to be detected, then feature extraction is performed on the two-dimensional image to obtain the number of objects included in the two-dimensional image. Vertex information.
- each vertices are clustered (that is, based on the information of the vertices, the object to be detected corresponding to the vertices is determined, and the vertices belonging to the same object to be detected are clustered together) to obtain the clustered target element set .
- the first object to be detected corresponds to the first set of target elements
- the second object to be detected corresponds to the second set of target elements.
- the structured polygon corresponding to the first object to be detected can be formed according to the target elements in the first target element set, and the information of the target element in the first target element set can be used as the attribute information of the structured polygon corresponding to the first object to be detected .
- the structured polygon corresponding to the second object to be detected can be formed according to the target elements in the second target element set, and the information of the target element in the second target element set can be used as the attribute information of the structured polygon corresponding to the second object to be detected .
- the target element set under each category is obtained by clustering each target element in the two-dimensional image, and the element in a target element set obtained in this way is an element in the object to be detected . Then, based on each target element set, the structured polygon of the object to be detected corresponding to the target element set can be obtained.
- the height information of the object to be detected and at least one side of the structured polygon corresponding to the object to be detected can be used. Calculate the depth information of the vertices in the structured polygon.
- the depth information of the vertices in the structured polygon is calculated based on the height information of the object to be detected and the height information of the vertical sides in the structured polygon corresponding to the object to be detected , Including: for each object to be detected, determining the ratio between the height of the object to be detected and the height of each vertical side in the structured polygon; and comparing the ratio corresponding to each vertical side with the imaging device that takes the two-dimensional image The product between the focal lengths is determined as the depth information of the vertex corresponding to the vertical side.
- a structured polygon 701 corresponding to the object to be detected, a three-dimensional bounding box 702 of the object to be detected in a three-dimensional space, and a camera 703 are shown in the figure. It can be seen from FIG. 7 that the height H of the object to be detected, the height h j of at least one vertical side in the structured polygon corresponding to the object to be detected, and the depth information Z j of the vertex corresponding to the at least one vertical side have the following relationship:
- f is the focal length of the camera
- j ⁇ 1,2,3,4 ⁇ , which is the serial number of any one of the four vertical sides of the structured polygon (that is, h 1 corresponds to the height of the first vertical side, h 2 corresponds to the height of the second vertical side, etc.).
- the value of f can be determined according to the imaging device. If j is 4, by determining the value of h 4 and the height H of the corresponding object to be detected, the depth information of any point on the vertical side corresponding to h 4 can be obtained, that is, the depth information of the vertices at both ends of the fourth vertical side can be obtained. Further, the depth information of each vertex on the structured polygon can be obtained.
- the value of h j can be determined on a structured polygon; or, when the attribute information is contour line information, after the contour line information is obtained, the value of h j can be determined based on the obtained contour line information; or Set the height information detection model, and determine the value of h j in the structured polygon based on the height information detection model.
- the height information detection model can be obtained based on neural network model training.
- determining the height of the object to be detected includes: determining the height of each object to be detected in the two-dimensional image based on the two-dimensional image and a pre-trained neural network for height detection; or, in advance Collect the true height values of the object to be detected in multiple different poses, and use the average of the collected true height values as the height of the object to be detected; or, based on a two-dimensional image and a pre-trained neural network for object detection , The regression variable of the object to be detected is obtained, and the height of the object to be detected is determined based on the regression variable and the average height of the plurality of objects to be detected in different postures obtained in advance. Among them, the regression variable is used to characterize the degree of deviation between the height of the object to be detected and the average height.
- the true height values of multiple vehicles of different models may be collected in advance, the collected true height values are averaged, and the obtained average value is used as the height of the object to be detected.
- the two-dimensional image may also be input into a trained neural network for height detection, to obtain the height of each object to be detected involved in the two-dimensional image.
- a trained neural network for height detection to obtain the height of each object to be detected involved in the two-dimensional image.
- the two-dimensional image can also be input into a trained neural network for object detection to obtain the regression variable of each object to be detected, based on the regression variable and the average of multiple objects to be detected in different poses obtained in advance.
- Height determine the height of each object to be detected.
- the average height of the object to be detected in the pose determines the height of each object to be detected.
- the regression variable t H , the average height A H , and the height H have the following relationship:
- the height H corresponding to each object to be detected can be obtained by the above formula (2).
- the depth information of the vertices in the structured polygon obtained by calculation and the two-dimensional coordinate information of the vertices of the structured polygon in the two-dimensional image can be used to determine the three-dimensional bounding box corresponding to the object to be detected.
- Three-dimensional coordinate information Based on the three-dimensional coordinate information of the three-dimensional bounding box corresponding to the object to be detected, the three-dimensional space information of the object to be detected is determined.
- each point on the object to be detected can obtain a unique projection point on the two-dimensional image. Therefore, there is the following relationship between each point on the object to be detected and the corresponding feature point on the two-dimensional image:
- K is the imaging apparatus internal control
- i can be characterized to be any point on the detection target
- [X i, Y i, Z i] took three-dimensional coordinate information bit i corresponding to object to be detected
- (u i, v i) is The two-dimensional coordinate information of the projection point of any point i on the two-dimensional image on the object to be detected.
- Z i is the corresponding depth information obtained by the solution.
- the three-dimensional coordinate information is coordinate information in the established world coordinate system
- the two-dimensional coordinate information is coordinate information in the established imaging plane coordinate system. The origin of the world coordinate system and the imaging plane coordinate system are the same.
- the three-dimensional space information of the object to be detected is related to the three-dimensional bounding box corresponding to the object to be detected.
- the three-dimensional space information of the object to be detected can be determined according to the three-dimensional bounding box corresponding to the object to be detected.
- the three-dimensional space information may include at least one of spatial position information, orientation information, and size information.
- the spatial position information may be the coordinate information of the center point of the three-dimensional bounding box corresponding to the object to be detected, for example, the line segment P 1 P 7 (the connection line between the vertex P 1 and the vertex P 7) and The coordinate information of the intersection point between the line segment P 2 P 8 (the line connecting the vertex P 2 and the vertex P 8 ); it can also be the coordinate information of the center point of any surface in the three-dimensional bounding box corresponding to the object to be detected, for example, The coordinate information of the center point of the plane formed by the vertex P 2 , the vertex P 3 , the vertex P 6 , and the vertex P 7 in 2 is the coordinate information of the intersection point between the line segment P 2 P 7 and the line segment P 3 P 6.
- the orientation information may be the angle value between the target plane set on the three-dimensional bounding box and the preset reference plane.
- Shown in Figure 8 is a top view of an image to be detected.
- FIG. 8 includes a target plane 81 set on the three-dimensional bounding box corresponding to the object to be detected and a preset reference plane 82 (the reference plane may be the plane where the imaging device is located), and it can be seen that the orientation information of the object to be detected 83 may be the included angle ⁇ 1.
- the orientation information of the object to be detected 84 may be the included angle ⁇ 2
- the orientation information of the object to be detected 85 may be the included angle ⁇ 3 .
- the size information may be any one or more of the length, width, and height of the three-dimensional bounding box corresponding to the object to be detected.
- the length of the three-dimensional bounding box may be the value of the line segment P 3 P 7
- the width of the three-dimensional bounding box may be the value of the line segment P 3 P 2
- the height of the three-dimensional bounding box may be the value of the line segment P 3 P 4.
- the average value of the four long sides may also be calculated, and the obtained average length is determined as the length of the three-dimensional bounding box.
- the width and height of the three-dimensional bounding box corresponding to the object to be detected can be obtained.
- the length of the three-dimensional bounding box can be determined by the selected part of the long side, and the three-dimensional boundary can be determined by the selected part of the wide side.
- the width of the box and the selected part of the vertical side determine the height of the three-dimensional bounding box to determine the size information of the three-dimensional bounding box.
- the selected part of the long side may be a long side that is not blocked
- the selected part of the wide side may be a wide side that is not blocked
- the selected part of the vertical side may be a vertical side that is not blocked.
- the method further includes: generating a bird's-eye view corresponding to the two-dimensional image based on the two-dimensional image and the depth map corresponding to the two-dimensional image;
- the three-dimensional space information of the object to be detected obtains the adjusted three-dimensional space information of the object to be detected.
- the corresponding depth map can be determined based on the two-dimensional image.
- the two-dimensional image can be input into the trained deep ordinal regression network (DORN) to obtain the corresponding depth map of the two-dimensional image.
- DORN trained deep ordinal regression network
- Depth map exemplary, the depth map corresponding to the two-dimensional image may also be determined based on the binocular ranging method.
- the depth map corresponding to the two-dimensional image can also be determined based on the depth camera.
- the method for determining the depth map corresponding to the two-dimensional image can be determined according to the actual situation, as long as the obtained depth map is consistent with the size of the two-dimensional image.
- a bird's-eye view corresponding to the two-dimensional image is generated, and the bird's-eye view includes the depth value.
- the adjusted three-dimensional space information can be more consistent with the corresponding object to be detected.
- generating a bird's-eye view corresponding to the two-dimensional image based on the two-dimensional image and the depth map corresponding to the two-dimensional image includes: obtaining the corresponding two-dimensional image based on the two-dimensional image and the depth map corresponding to the two-dimensional image Point cloud data, where the point cloud data includes the three-dimensional coordinate values of multiple spatial points in the real space corresponding to the two-dimensional image; based on the three-dimensional coordinate values of each spatial point in the point cloud data, a bird’s-eye view corresponding to the two-dimensional image is generated .
- the Z i for the feature point i on the two-dimensional image information based on two-dimensional coordinate (u i, v i) of the feature point and the corresponding depth values in the depth map the Z i, provided by the above formulas (3) obtain three-dimensional coordinate values (X i, Y i, Z i) spatial point in the real space corresponding to the feature point i in the thus obtained three-dimensional coordinates of the two-dimensional real space image corresponding to each spatial point value. Further, based on the three-dimensional coordinate value of each spatial point in the point cloud data, a bird's-eye view corresponding to the two-dimensional image is generated.
- generating a bird's-eye view corresponding to the two-dimensional image includes: for each spatial point, determining the horizontal axis coordinate value of the spatial point as The horizontal axis coordinate value of the feature point corresponding to the spatial point in the bird's-eye view, the vertical axis coordinate value of the spatial point is determined as the pixel channel value of the feature point corresponding to the spatial point in the bird's-eye view, and the vertical axis of the spatial point The axis coordinate value is determined as the vertical axis coordinate value of the feature point corresponding to the spatial point in the bird's-eye view.
- the horizontal axis coordinate value X A of the spatial point is determined as the horizontal axis coordinate of the feature point corresponding to the spatial point A in the bird's-eye view
- the vertical axis coordinate value Y A of the spatial point is determined as the pixel channel value of the feature point corresponding to the spatial point A in the bird’s-eye view
- the vertical axis coordinate value Z A of the spatial point is determined as the bird’s-eye view and the The vertical axis coordinate value of the feature point corresponding to the spatial point A.
- one feature point on the bird's-eye view may correspond to multiple spatial points, and the multiple spatial points are spatial points at the same horizontal position and different height values.
- X A and Y A of the multiple spatial points are the same, but Z A is not the same.
- the largest value can be selected from the vertical axis coordinate values Z A corresponding to the multiple spatial points as the pixel channel value corresponding to the feature point.
- adjusting the three-dimensional space information of the object to be detected based on a bird's-eye view to obtain adjusted three-dimensional space information of the object to be detected includes: S901 , Extract the first feature data corresponding to the bird's-eye view; S902, based on the three-dimensional space information of each object to be detected and the first preset size information, in the first feature data corresponding to the bird's-eye view, select each object to be detected Second feature data; S903, based on the second feature data corresponding to each object to be detected, determine the adjusted three-dimensional space information of the object to be detected.
- the first feature data corresponding to the bird's-eye view may be extracted based on the convolutional neural network.
- the three-dimensional bounding box corresponding to each object to be detected may be determined based on the three-dimensional space information of each object to be detected. With the center point of the three-dimensional bounding box as the center and the first preset size as the size, a selection frame corresponding to each object to be detected is determined. Based on the determined selection frame, the second feature data corresponding to each object to be detected is selected from the first feature data corresponding to the bird's-eye view.
- the center point of the three-dimensional bounding box is used as the center to determine a marquee box with a length of 6 cm and a width of 4 cm. Based on the determined target selection box, Among the first feature data corresponding to the bird's-eye view, the second feature data corresponding to each object to be detected is selected.
- the second feature data corresponding to each object to be detected may also be input to at least one convolution layer for convolution processing to obtain intermediate feature data corresponding to the second feature data.
- the obtained intermediate feature data is input to the first fully connected layer for processing, and the residual value of the three-dimensional spatial information of the object to be detected is obtained. Based on the residual value of the three-dimensional space information, the adjusted three-dimensional space information of the object to be detected is determined.
- the obtained intermediate feature data can also be input to the second fully connected layer for processing, and the adjusted three-dimensional space information of the object to be detected can be directly obtained.
- the second feature data corresponding to each object to be detected is selected from the first feature data corresponding to the bird's-eye view, and the second feature data corresponding to each object to be detected is determined. Adjusted three-dimensional space information. In this way, the data processing volume of the adjusted three-dimensional spatial information model used to determine the object to be detected is small, and the processing efficiency can be improved.
- an image detection model can be set, and the acquired two-dimensional image can be input into a trained image detection model for processing, so as to obtain adjusted three-dimensional space information of each object to be detected included in the two-dimensional image.
- the image detection model includes a first convolution layer 1001, a second convolution layer 1002, a third convolution layer 1003, a fourth convolution layer 1004, a first detection model 1005, a second detection model 1006, and an optimization model 1007.
- the first detection model 1005 includes two stacked hourglass networks 10051
- the second detection model 1006 includes at least one first fully connected layer 10061
- the optimization model 1007 includes a deep ordered regression network 10071, a fifth convolutional layer 10072, and a second Six convolutional layers 10073, a seventh convolutional layer 10074, and a second fully connected layer 10075.
- the acquired two-dimensional image 1008 is input into the interception model for processing, and a target image 1009 corresponding to at least one object to be detected included in the two-dimensional image is obtained.
- the interception model is used to detect the two-dimensional image to obtain a rectangular detection frame corresponding to at least one object to be detected included in the two-dimensional image. Then, based on the rectangular detection frame corresponding to each object to be detected and the corresponding second preset size information, a target image corresponding to each object to be detected is selected from the two-dimensional image.
- each target image 1009 is input to the first convolution layer 1001 for convolution processing to obtain the first convolution feature data corresponding to each target image. Then, the first convolution feature data corresponding to each target image is input into the first detection model 1005.
- the two hourglass networks 10051 stacked in the first detection model 1005 correspond to the first convolution feature data for each target image.
- the structured polygon corresponding to each target image is obtained. Then, the obtained structured polygon corresponding to each target image is input into the second detection model 1006.
- the first convolution feature data corresponding to each target image is sequentially input into the second convolution layer 1002, the third convolution layer 1003, and the fourth convolution layer 1004 for convolution processing to obtain each target image corresponding The second convolution feature data.
- Input the second convolution feature data into the second detection model 1006, and at least one first fully connected layer 10061 in the second detection model 1006 processes the second convolution feature data to obtain height information of each object to be detected .
- the depth information of the vertices in each object to be inspected is determined, and then the three-dimensional space information of each object to be inspected is obtained, and the obtained three-dimensional space information is input to Optimizing the model.
- the two-dimensional image is input into the optimization model 1007, and the depth ordered regression network 10071 in the optimization model 1007 processes the two-dimensional image to obtain a depth map corresponding to the two-dimensional image.
- a bird's-eye view corresponding to the two-dimensional image is obtained and input to the fifth convolution layer 10072 for convolution processing to obtain the first feature data corresponding to the bird's-eye view.
- the second feature data corresponding to each object to be detected is selected from the first feature data corresponding to the bird's-eye view.
- the second feature data is sequentially input into the sixth convolution layer 10073 and the seventh convolution layer 10074 for convolution processing to obtain the third convolution feature data.
- the third convolution feature data is input to the second fully connected layer 10075 for processing, and the adjusted three-dimensional space information of each object to be detected is obtained.
- the constructed structured polygon is the projection of the three-dimensional bounding box corresponding to the object to be detected in the two-dimensional image
- the constructed structured polygon can better characterize the object to be detected The three-dimensional features. This makes the depth information predicted based on the structured polygon more accurate than the depth information directly predicted based on the features of the two-dimensional image, which in turn makes the three-dimensional spatial information of the object to be inspected correspondingly more accurate. High, which improves the accuracy of 3D detection results.
- the writing order of the steps does not mean a strict execution order but constitutes any limitation on the implementation process.
- the specific execution order of each step should be based on its function and possibility.
- the inner logic is determined.
- the embodiment of the present disclosure also provides a detection device.
- the schematic diagram of the architecture of the detection device provided by the embodiment of the present disclosure includes an image acquisition module 1101, a structured polygon construction module 1102, a depth information determination module 1103,
- the three-dimensional spatial information determination module 1104 specifically: an image acquisition module 1101, is used to obtain two-dimensional images; a structured polygon construction module 1102, which is used to construct at least one of the two-dimensional images based on the obtained two-dimensional images Structured polygons respectively corresponding to the objects to be detected, wherein the structured polygons corresponding to each object to be detected represent the projection of the three-dimensional bounding box corresponding to the object to be detected on the two-dimensional image; depth information determining module 1103.
- a three-dimensional spatial information determination module 1104 configured to determine the depth information of the vertices in the structured polygon and the two-dimensional coordinate information of the vertices of the structured polygon in the two-dimensional image
- the three-dimensional space information of the object to be detected, and the three-dimensional space information of the object to be detected is related to the three-dimensional bounding box corresponding to the object to be detected.
- the detection device further includes: a bird's-eye view determining module 1105, configured to generate a bird's-eye view corresponding to the two-dimensional image based on the two-dimensional image and the depth map corresponding to the two-dimensional image
- the adjustment module 1106 is configured to adjust the three-dimensional space information of each object to be detected based on the bird's-eye view for each object to be detected to obtain adjusted three-dimensional space information of the object to be detected.
- the bird's-eye view determining module is configured to obtain point cloud data corresponding to the two-dimensional image based on the two-dimensional image and a depth map corresponding to the two-dimensional image, wherein the The point cloud data includes the three-dimensional coordinate values of multiple spatial points in the real space corresponding to the two-dimensional image; based on the three-dimensional coordinate values of each of the spatial points in the point cloud data, a bird’s-eye view corresponding to the two-dimensional image is generated picture.
- the bird's-eye view determining module is configured to: for each spatial point: determine the horizontal axis coordinate value of the spatial point as the value of the feature point corresponding to the spatial point in the bird's-eye view The horizontal axis coordinate value, the vertical axis coordinate value of the spatial point is determined as the pixel channel value of the feature point corresponding to the spatial point in the bird's-eye view, and the vertical axis coordinate value of the spatial point is determined as the bird's-eye view The vertical axis coordinate value of the feature point corresponding to the space point.
- the adjustment module is configured to: extract first feature data corresponding to the bird's-eye view; based on the three-dimensional space information and first preset size information of each object to be detected, In the first feature data corresponding to the bird's-eye view, the second feature data corresponding to each object to be detected is selected; based on the second feature data corresponding to each object to be detected, the object to be detected is determined The adjusted three-dimensional space information of the object.
- the structured polygon building module is configured to: based on the two-dimensional image, determine the attribute information of the structured polygon corresponding to each object to be detected, wherein the attribute information It includes at least one of the following: vertex information, surface information, and contour line information; based on the attribute information of the structured polygon corresponding to each object to be detected, the structured polygon corresponding to each object to be detected is constructed.
- the structured polygon building module is used to: perform object detection on the two-dimensional image to obtain at least one object area in the two-dimensional image, wherein each object area includes One object to be detected; based on the object area corresponding to each object to be detected and second preset size information, from the two-dimensional image, a target image corresponding to each object to be detected is intercepted, Wherein, the second preset size information indicates the size of the object area greater than or equal to each of the objects to be detected; feature extraction is performed on the target image corresponding to each object to be detected to obtain each of the objects to be detected The attribute information of the structured polygon corresponding to the object.
- the structured polygon building module is used to: extract feature data of the target image based on a convolutional neural network; process the feature data based on at least one stacked hourglass network to obtain the The heat map set of the object to be detected corresponding to the target image, wherein the heat map set includes a plurality of heat maps, and each of the heat maps includes a plurality of vertices of the structured polygon corresponding to the object to be inspected A vertex in; determining the attribute information of the structured polygon corresponding to the object to be detected based on the heat atlas corresponding to the object to be detected.
- the structured polygon building module is configured to: perform feature extraction on the two-dimensional image to obtain information about multiple target elements in the two-dimensional image, and the target elements include vertices and surfaces. At least one of, and contour lines; clustering each of the target elements based on the information of the multiple target elements to obtain at least one clustered target element set; for each of the target element sets: A structured polygon is formed according to the target elements in the target element set, and the information of the target element in the target element set is used as the attribute information of the structured polygon.
- the depth information determining module is configured to: for each object to be detected, determine the difference between the height of the object to be detected and the height of each vertical side in the structured polygon Ratio; the product of the ratio corresponding to each vertical side and the focal length of the imaging device that took the two-dimensional image is determined as the depth information of the vertex corresponding to the vertical side.
- the depth information determining module is configured to: determine each of the objects to be detected in the two-dimensional image based on the two-dimensional image and a pre-trained neural network for height detection Or, pre-acquire the real height values of a plurality of different postures of the object to be detected, and use the average value of the collected real height values as the height of the object to be detected; or, based on the two-dimensional image and pre-detection
- the trained neural network for object detection obtains the regression variable of the object to be detected, and determines the average height of the object to be detected based on the regression variable and the average height of the object to be detected in a plurality of different postures obtained in advance. Height; wherein, the regression variable is used to characterize the degree of deviation between the height of the object to be detected and the average height.
- the functions or templates contained in the device provided in the embodiments of the present disclosure can be used to execute the methods described in the above method embodiments.
- the functions or templates contained in the device provided in the embodiments of the present disclosure can be used to execute the methods described in the above method embodiments.
- the embodiment of the present disclosure also provides an electronic device.
- FIG. 12 it is a schematic structural diagram of an electronic device provided by an embodiment of the present disclosure, which includes a processor 1201, a memory 1202, and a bus 1203.
- the memory 1202 is used to store execution instructions, and includes a memory 12021 and an external memory 12022.
- the memory 12021 is also called internal memory, and is used to temporarily store calculation data in the processor 1201 and data exchanged with an external memory 12022 such as a hard disk.
- the processor 1201 exchanges data with the external memory 12022 through the memory 12021.
- the processor 1201 and the memory 1202 communicate through the bus 1203, so that the processor 1201 executes the following instructions: obtain a two-dimensional image;
- the obtained two-dimensional image is used to construct a structured polygon corresponding to at least one object to be detected in the two-dimensional image, wherein the structured polygon corresponding to each object to be detected represents the three-dimensional object corresponding to the object to be detected.
- embodiments of the present disclosure also provide a computer-readable storage medium having a computer program stored on the computer-readable storage medium, and the computer program executes the steps of the detection method described in the foregoing method embodiment when the computer program is run by a processor.
- the computer program product of the detection method provided by the embodiment of the present disclosure includes a computer-readable storage medium storing program code.
- the instructions included in the program code can be used to execute the steps of the detection method described in the above method embodiment. Please refer to the above method embodiment, which will not be repeated here.
- the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
- the functional units in the various embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
- the function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a non-volatile computer readable storage medium executable by a processor.
- the technical solution of the present disclosure essentially or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present disclosure.
- the aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program code .
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Computing Systems (AREA)
- Computer Graphics (AREA)
- Geometry (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
Abstract
Description
Claims (14)
- 一种检测方法,包括:获取二维图像;基于所述获取的二维图像,构建所述二维图像中的至少一个待检测对象分别对应的结构化多边形,其中,每个所述待检测对象对应的所述结构化多边形表征该待检测对象对应的三维边界框在所述二维图像上的投影;针对每个所述待检测对象,基于该待检测对象的高度信息、以及该待检测对象对应的所述结构化多边形中竖边的高度信息,计算所述结构化多边形中顶点的深度信息;基于所述结构化多边形中顶点的所述深度信息、以及所述结构化多边形的顶点在所述二维图像中的二维坐标信息,确定所述待检测对象的三维空间信息,所述待检测对象的所述三维空间信息与所述待检测对象对应的所述三维边界框相关。
- 根据权利要求1所述的检测方法,其中,确定所述待检测对象的所述三维空间信息之后,所述检测方法还包括:基于所述二维图像以及所述二维图像对应的深度图,生成所述二维图像对应的鸟瞰图;基于所述鸟瞰图调整每个所述待检测对象的所述三维空间信息,得到所述待检测对象的调整后的三维空间信息。
- 根据权利要求2所述的检测方法,其中,所述基于所述二维图像以及所述二维图像对应的深度图,生成所述二维图像对应的鸟瞰图,包括:基于所述二维图像以及所述二维图像对应的深度图,得到所述二维图像对应的点云数据,其中,所述点云数据包括所述二维图像对应的真实空间中多个空间点的三维坐标值;基于所述点云数据中每个所述空间点的三维坐标值,生成所述二维图像对应的鸟瞰图。
- 根据权利要求3所述的检测方法,其中,所述基于所述点云数据中每个所述空间点的三维坐标值,生成所述二维图像对应的鸟瞰图,包括:针对每个所述空间点:将该空间点的横轴坐标值确定为所述鸟瞰图中与该空间点对应的特征点的横轴坐标值;将该空间点的纵轴坐标值确定为所述鸟瞰图中与该空间点对应的特征点的像 素通道值;将该空间点的竖轴坐标值确定为所述鸟瞰图中与该空间点对应的特征点的纵轴坐标值。
- 根据权利要求2所述的检测方法,其中,基于所述鸟瞰图调整所述待检测对象的所述三维空间信息,得到所述待检测对象的调整后的三维空间信息,包括:提取所述鸟瞰图对应的第一特征数据;基于所述待检测对象的所述三维空间信息以及第一预设尺寸信息,在所述鸟瞰图对应的所述第一特征数据中,选取所述待检测对象对应的第二特征数据;基于所述待检测对象对应的所述第二特征数据,确定所述待检测对象的所述调整后的三维空间信息。
- 根据权利要求1所述的检测方法,其中,所述基于所述获取的二维图像,构建所述二维图像中的至少一个待检测对象分别对应的结构化多边形,包括:基于所述二维图像,确定每个所述待检测对象对应的所述结构化多边形的属性信息,其中,所述属性信息包括以下至少一种:顶点信息、表面信息、以及轮廓线信息;基于每个所述待检测对象对应的结构化多边形的所述属性信息,构建该待检测对象对应的所述结构化多边形。
- 根据权利要求6所述的检测方法,其中,所述基于所述二维图像,确定每个所述待检测对象对应的所述结构化多边形的属性信息,包括:对所述二维图像进行对象检测,得到所述二维图像中的至少一个对象区域,其中,每个所述对象区域包含一个所述待检测对象;基于每个所述待检测对象对应的所述对象区域以及第二预设尺寸信息,从所述二维图像中,截取每个所述待检测对象对应的目标图像,所述第二预设尺寸信息表示大于或等于每个所述待检测对象的对象区域的尺寸;对每个所述待检测对象对应的目标图像进行特征提取,得到每个所述待检测对象对应的所述结构化多边形的所述属性信息。
- 根据权利要求7所述的检测方法,其中,在所述属性信息包括顶点信息的情况下,根据以下步骤对所述待检测对象对应的所述目标图像进行特征提取,得到所述待检测对象对应的所述结构化多边形的所述属性信息:基于卷积神经网络提取所述目标图像的特征数据;基于堆叠的至少一个沙漏网络对所述特征数据进行处理,得到所述目标图像对应的所述待检测对象的热力图集,其中,所述热力图集中包括多个热力图,每个所述热力图 包含所述待检测对象对应的所述结构化多边形的多个顶点中的一个顶点;基于所述待检测对象的所述热力图集,确定所述待检测对象对应的所述结构化多边形的所述属性信息。
- 根据权利要求6所述的检测方法,其中,所述基于所述二维图像,确定所述待检测对象对应的所述结构化多边形的属性信息,包括:对所述二维图像进行特征提取,得到所述二维图像中多个目标元素的信息,所述目标元素包括顶点、表面、和轮廓线中的至少一种;基于所述多个目标元素的信息,对各个所述目标元素进行聚类,得到至少一个聚类后的目标元素集合;针对每个所述目标元素集合:根据所述目标元素集合中的目标元素组成结构化多边形,并将该目标元素集合中的目标元素的信息作为该结构化多边形的属性信息。
- 根据权利要求1所述的检测方法,其中,基于该待检测对象的高度信息、以及该待检测对象对应的所述结构化多边形中竖边的高度信息,计算所述结构化多边形中顶点的深度信息,包括:确定该待检测对象的高度与所述结构化多边形中的每条竖边的高度之间的比值;将每条所述竖边对应的所述比值与拍摄所述二维图像的摄像设备的焦距之间的乘积,确定为该条竖边对应的顶点的深度信息。
- 根据权利要求1所述的检测方法,其中,所述待检测对象的高度通过以下方式确定:基于所述二维图像和预先训练的用于进行高度检测的神经网络,确定所述待检测对象的高度;或者,预先采集多个不同姿态的待检测对象的真实高度值,将采集的多个真实高度值的平均值作为所述待检测对象的高度;或者,基于所述二维图像和预先训练的用于进行对象检测的神经网络,得到所述待检测对象的回归变量;基于所述回归变量以及预先得到的多个不同姿态的待检测对象的平均高度,确定所述待检测对象的高度;其中,所述回归变量用于表征所述待检测对象的高度与所述平均高度之间的偏差程度。
- 一种检测装置,包括:图像获取模块,用于获取二维图像;结构化多边形构建模块,用于基于所述获取的二维图像,构建所述二维图像中的至 少一个待检测对象分别对应的结构化多边形,其中,每个所述待检测对象对应的所述结构化多边形表征该待检测对象对应的三维边界框在所述二维图像上的投影;深度信息确定模块,用于针对每个所述待检测对象,基于该待检测对象的高度信息、以及该待检测对象对应的所述结构化多边形中竖边的高度信息,计算所述结构化多边形中顶点的深度信息;三维空间信息确定模块,用于基于所述结构化多边形中顶点的所述深度信息、以及所述结构化多边形的顶点在所述二维图像中的二维坐标信息,确定所述待检测对象的三维空间信息,所述待检测对象的所述三维空间信息与所述待检测对象对应的所述三维边界框相关。
- 一种电子设备,包括:处理器;存储器,所述存储器存储有所述处理器可执行的机器可读指令;和总线,当所述电子设备运行时,所述处理器与所述存储器之间通过所述总线通信;所述机器可读指令被所述处理器执行时执行如权利要求1至11任一所述的检测方法的步骤。
- 一种计算机可读存储介质,其中,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器运行时执行如权利要求1至11任一所述的检测方法的步骤。
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2022500618A JP2022531625A (ja) | 2020-01-19 | 2021-01-19 | 検出方法、装置、電子機器及び記憶媒体 |
SG11202108275VA SG11202108275VA (en) | 2020-01-19 | 2021-01-19 | Detection methods, detection apparatuses, electronic devices and storage media |
KR1020217042317A KR20220013565A (ko) | 2020-01-19 | 2021-01-19 | 검출 방법, 디바이스, 전자 장치 및 저장 매체 |
US17/388,912 US20210358153A1 (en) | 2020-01-19 | 2021-07-29 | Detection methods, detection apparatuses, electronic devices and storage media |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010060288.7A CN111274943B (zh) | 2020-01-19 | 2020-01-19 | 一种检测方法、装置、电子设备及存储介质 |
CN202010060288.7 | 2020-01-19 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/388,912 Continuation US20210358153A1 (en) | 2020-01-19 | 2021-07-29 | Detection methods, detection apparatuses, electronic devices and storage media |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021143935A1 true WO2021143935A1 (zh) | 2021-07-22 |
Family
ID=71002197
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/072750 WO2021143935A1 (zh) | 2020-01-19 | 2021-01-19 | 一种检测方法、装置、电子设备及存储介质 |
Country Status (6)
Country | Link |
---|---|
US (1) | US20210358153A1 (zh) |
JP (1) | JP2022531625A (zh) |
KR (1) | KR20220013565A (zh) |
CN (1) | CN111274943B (zh) |
SG (1) | SG11202108275VA (zh) |
WO (1) | WO2021143935A1 (zh) |
Families Citing this family (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111274943B (zh) * | 2020-01-19 | 2023-06-23 | 深圳市商汤科技有限公司 | 一种检测方法、装置、电子设备及存储介质 |
CN111882531B (zh) * | 2020-07-15 | 2021-08-17 | 中国科学技术大学 | 髋关节超声图像自动分析方法 |
CN111931643A (zh) * | 2020-08-08 | 2020-11-13 | 商汤集团有限公司 | 一种目标检测方法、装置、电子设备及存储介质 |
JP7481468B2 (ja) * | 2020-09-02 | 2024-05-10 | ファナック株式会社 | ロボットシステム及び制御方法 |
CN112132829A (zh) * | 2020-10-23 | 2020-12-25 | 北京百度网讯科技有限公司 | 车辆信息的检测方法、装置、电子设备和存储介质 |
CN112926395A (zh) * | 2021-01-27 | 2021-06-08 | 上海商汤临港智能科技有限公司 | 目标检测方法、装置、计算机设备及存储介质 |
CN113240734B (zh) * | 2021-06-01 | 2024-05-17 | 深圳市捷顺科技实业股份有限公司 | 一种基于鸟瞰图的车辆跨位判断方法、装置、设备及介质 |
CN114842287B (zh) * | 2022-03-25 | 2022-12-06 | 中国科学院自动化研究所 | 深度引导变形器的单目三维目标检测模型训练方法及装置 |
CN114387346A (zh) * | 2022-03-25 | 2022-04-22 | 阿里巴巴达摩院(杭州)科技有限公司 | 一种图像识别、预测模型处理方法、三维建模方法和装置 |
CN117611752B (zh) * | 2024-01-22 | 2024-04-02 | 卓世未来(成都)科技有限公司 | 一种数字人的3d模型生成方法及系统 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106952303A (zh) * | 2017-03-09 | 2017-07-14 | 北京旷视科技有限公司 | 车距检测方法、装置和系统 |
CN107992827A (zh) * | 2017-12-03 | 2018-05-04 | 湖南工程学院 | 一种基于三维模型的多运动目标跟踪的方法及装置 |
US20200013186A1 (en) * | 2016-06-14 | 2020-01-09 | Disney Enterprises, lnc. | Apparatus, Systems and Methods For Shadow Assisted Object Recognition and Tracking |
CN111274943A (zh) * | 2020-01-19 | 2020-06-12 | 深圳市商汤科技有限公司 | 一种检测方法、装置、电子设备及存储介质 |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP6473571B2 (ja) * | 2014-03-24 | 2019-02-20 | アルパイン株式会社 | Ttc計測装置およびttc計測プログラム |
JP6965803B2 (ja) * | 2018-03-20 | 2021-11-10 | 株式会社Jvcケンウッド | 認識装置、認識方法及び認識プログラム |
CN109146769A (zh) * | 2018-07-24 | 2019-01-04 | 北京市商汤科技开发有限公司 | 图像处理方法及装置、图像处理设备及存储介质 |
CN110070606B (zh) * | 2019-04-01 | 2023-01-03 | 浙江大华技术股份有限公司 | 空间绘制方法、目标检测方法、检测装置及存储介质 |
CN110472534A (zh) * | 2019-07-31 | 2019-11-19 | 厦门理工学院 | 基于rgb-d数据的3d目标检测方法、装置、设备和存储介质 |
CN110689008A (zh) * | 2019-09-17 | 2020-01-14 | 大连理工大学 | 一种面向单目图像的基于三维重建的三维物体检测方法 |
-
2020
- 2020-01-19 CN CN202010060288.7A patent/CN111274943B/zh active Active
-
2021
- 2021-01-19 WO PCT/CN2021/072750 patent/WO2021143935A1/zh active Application Filing
- 2021-01-19 KR KR1020217042317A patent/KR20220013565A/ko not_active Application Discontinuation
- 2021-01-19 JP JP2022500618A patent/JP2022531625A/ja active Pending
- 2021-01-19 SG SG11202108275VA patent/SG11202108275VA/en unknown
- 2021-07-29 US US17/388,912 patent/US20210358153A1/en not_active Abandoned
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200013186A1 (en) * | 2016-06-14 | 2020-01-09 | Disney Enterprises, lnc. | Apparatus, Systems and Methods For Shadow Assisted Object Recognition and Tracking |
CN106952303A (zh) * | 2017-03-09 | 2017-07-14 | 北京旷视科技有限公司 | 车距检测方法、装置和系统 |
CN107992827A (zh) * | 2017-12-03 | 2018-05-04 | 湖南工程学院 | 一种基于三维模型的多运动目标跟踪的方法及装置 |
CN111274943A (zh) * | 2020-01-19 | 2020-06-12 | 深圳市商汤科技有限公司 | 一种检测方法、装置、电子设备及存储介质 |
Also Published As
Publication number | Publication date |
---|---|
CN111274943B (zh) | 2023-06-23 |
SG11202108275VA (en) | 2021-08-30 |
KR20220013565A (ko) | 2022-02-04 |
JP2022531625A (ja) | 2022-07-07 |
US20210358153A1 (en) | 2021-11-18 |
CN111274943A (zh) | 2020-06-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021143935A1 (zh) | 一种检测方法、装置、电子设备及存储介质 | |
CN110568447B (zh) | 视觉定位的方法、装置及计算机可读介质 | |
WO2020206903A1 (zh) | 影像匹配方法、装置及计算机可读存储介质 | |
US8199977B2 (en) | System and method for extraction of features from a 3-D point cloud | |
JP5778237B2 (ja) | ポイントクラウド内の埋め戻しポイント | |
CN110176032B (zh) | 一种三维重建方法及装置 | |
EP3274964B1 (en) | Automatic connection of images using visual features | |
JP7193494B2 (ja) | オブジェクトの姿勢推定を生成するシステム、デバイス、および方法 | |
US20120177284A1 (en) | Forming 3d models using multiple images | |
US20180182163A1 (en) | 3d model generating system, 3d model generating method, and program | |
CN110567441B (zh) | 基于粒子滤波的定位方法、定位装置、建图及定位的方法 | |
WO2021098079A1 (zh) | 一种利用双目立体相机构建栅格地图的方法 | |
JP2016179534A (ja) | 情報処理装置、情報処理方法、プログラム | |
WO2023016082A1 (zh) | 三维重建方法、装置、电子设备及存储介质 | |
CN108460333B (zh) | 基于深度图的地面检测方法及装置 | |
CN107679458A (zh) | 一种基于K‑Means的道路彩色激光点云中道路标线的提取方法 | |
CN117635875B (zh) | 一种三维重建方法、装置及终端 | |
US11189053B2 (en) | Information processing apparatus, method of controlling information processing apparatus, and non-transitory computer-readable storage medium | |
CN111198563B (zh) | 一种用于足式机器人动态运动的地形识别方法及系统 | |
CN111179271B (zh) | 一种基于检索匹配的物体角度信息标注方法及电子设备 | |
CN117235299A (zh) | 一种倾斜摄影像片快速索引方法、系统、设备及介质 | |
CN111652163A (zh) | 一种输电线路杆塔线段匹配方法以及设备 | |
CN114648639B (zh) | 一种目标车辆的检测方法、系统及装置 | |
EP3076370B1 (en) | Method and system for selecting optimum values for parameter set for disparity calculation | |
CN113592976A (zh) | 地图数据的处理方法、装置、家用电器和可读存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21741421 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 20217042317 Country of ref document: KR Kind code of ref document: A |
|
ENP | Entry into the national phase |
Ref document number: 2022500618 Country of ref document: JP Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 14.11.2022) |
|
WWE | Wipo information: entry into national phase |
Ref document number: 521430009 Country of ref document: SA |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21741421 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 521430009 Country of ref document: SA |