WO2021164469A1 - 目标对象的检测方法、装置、设备和存储介质 - Google Patents
目标对象的检测方法、装置、设备和存储介质 Download PDFInfo
- Publication number
- WO2021164469A1 WO2021164469A1 PCT/CN2021/071295 CN2021071295W WO2021164469A1 WO 2021164469 A1 WO2021164469 A1 WO 2021164469A1 CN 2021071295 W CN2021071295 W CN 2021071295W WO 2021164469 A1 WO2021164469 A1 WO 2021164469A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- point cloud
- dimensional
- sampling
- target object
- feature vector
- Prior art date
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 186
- 238000003860 storage Methods 0.000 title claims abstract description 27
- 238000000605 extraction Methods 0.000 claims abstract description 119
- 238000000034 method Methods 0.000 claims abstract description 104
- 238000005070 sampling Methods 0.000 claims description 238
- 239000013598 vector Substances 0.000 claims description 204
- 239000011159 matrix material Substances 0.000 claims description 35
- 238000012549 training Methods 0.000 claims description 32
- 230000004927 fusion Effects 0.000 claims description 22
- 238000004590 computer program Methods 0.000 claims description 19
- 238000011176 pooling Methods 0.000 claims description 7
- 230000008569 process Effects 0.000 description 42
- 238000012545 processing Methods 0.000 description 26
- 238000010586 diagram Methods 0.000 description 20
- 238000013528 artificial neural network Methods 0.000 description 14
- 230000006870 function Effects 0.000 description 12
- 238000004891 communication Methods 0.000 description 10
- 238000005516 engineering process Methods 0.000 description 6
- 230000009471 action Effects 0.000 description 5
- 230000002776 aggregation Effects 0.000 description 5
- 238000004220 aggregation Methods 0.000 description 5
- 230000005540 biological transmission Effects 0.000 description 5
- 238000004364 calculation method Methods 0.000 description 5
- 238000004422 calculation algorithm Methods 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 4
- 230000005236 sound signal Effects 0.000 description 4
- 230000007246 mechanism Effects 0.000 description 3
- 230000001133 acceleration Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 230000001902 propagating effect Effects 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- RYGMFSIKBFXOCR-UHFFFAOYSA-N Copper Chemical compound [Cu] RYGMFSIKBFXOCR-UHFFFAOYSA-N 0.000 description 1
- 206010011469 Crying Diseases 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000003190 augmentative effect Effects 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 229910052802 copper Inorganic materials 0.000 description 1
- 239000010949 copper Substances 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 238000007499 fusion processing Methods 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000012805 post-processing Methods 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000011895 specific detection Methods 0.000 description 1
- 230000001629 suppression Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/0002—Inspection of images, e.g. flaw detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10004—Still image; Photographic image
- G06T2207/10012—Stereo images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10028—Range image; Depth image; 3D point clouds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Definitions
- the present disclosure relates to the field of computer vision technology, and in particular to a detection method, device, equipment and storage medium for a target object.
- 3D target detection is an important issue in the field of computer vision and intelligent scene understanding. It can be applied in many important fields, such as unmanned driving, robots, augmented reality, etc. It has important research significance and application value.
- the 3D point cloud can be matched with the target model to determine whether the 3D point cloud contains the target object. If the 3D point cloud contains multiple different target objects, it may need to be matched with multiple different target models separately, which will take a long time and the accuracy of detection will also be reduced.
- the present disclosure proposes a target object detection scheme.
- a method for detecting a target object including:
- the performing feature extraction on the three-dimensional point cloud of the target scene to obtain the feature extraction result includes: sampling the three-dimensional point cloud to obtain the first sampling point; in the three-dimensional point cloud Construct a sampling area centered on the first sampling point; perform feature extraction on the sampling area to obtain the feature vector of the sampling area; determine the three-dimensional point cloud including the feature vector according to the feature vector of the sampling area The feature vector of the three-dimensional point is used as the feature extraction result.
- the step of performing category prediction and position prediction of the target object on the three-dimensional point cloud according to the feature extraction result, and determining at least one candidate region of the target object in the target scene includes : According to the feature extraction result, perform the category prediction of the target object on the three-dimensional point cloud to obtain the category prediction result, wherein the category prediction result is used to indicate the value of the target object to which the three-dimensional points included in the three-dimensional point cloud belong Category; according to the feature extraction result, the three-dimensional point cloud is predicted to obtain a position prediction result, wherein the position prediction result is used to indicate the position of the three-dimensional point of the target object in the three-dimensional point cloud Location; according to the category prediction result and the location prediction result, determine at least one candidate area in the scene that includes the target object.
- the performing category prediction on the three-dimensional point cloud according to the feature extraction result to obtain the category prediction result includes: processing the feature extraction result through a category prediction convolutional network, Obtain the category of the target object to which the three-dimensional points included in the three-dimensional point cloud belong.
- the performing position prediction on the three-dimensional point cloud according to the feature extraction result to obtain the position prediction result includes: processing the feature extraction result through a position prediction convolutional network, Obtain the residual amount between the three-dimensional point included in the three-dimensional point cloud and at least one preset detection frame, wherein the number of the preset detection frame is not less than one; according to the residual amount, the three-dimensional At least one detection frame matched by the point is used as the position prediction result.
- the position prediction convolutional network is trained by training data, and the training data includes a three-dimensional point cloud sample, the first position of the sample object in the three-dimensional point cloud sample, and the sample The first feature vector corresponding to the category of the object, the training includes: predicting the convolutional network based on the three-dimensional point cloud sample and the initial position to obtain a first position prediction result; according to the first position prediction result and the first position prediction result The first error loss is obtained according to the error between the positions; according to the distance between the feature vector of the three-dimensional point included in the three-dimensional point cloud sample and the first feature vector, the second error loss is obtained; according to the first The error loss and/or the second error loss are used to train the initial position prediction convolutional network.
- the determining, according to the category prediction result and the location prediction result, that at least one candidate area in the scene that includes the target object includes: acquiring At least one detection frame; the prediction score of the at least one detection frame is respectively obtained according to the category prediction results of the three-dimensional points included in the detection frame; the detection frame with the prediction score greater than the score threshold is used as a candidate for the target object area.
- detecting the target object before obtaining the detection result further includes: determining three-dimensional sub-points composed of three-dimensional points included in the at least one candidate area Obtain the coordinates of the three-dimensional point included in the three-dimensional sub-point cloud as the spatial coordinates of the three-dimensional sub-point cloud; obtain the feature vector of the three-dimensional point included in the three-dimensional sub-point cloud as the three-dimensional sub-point cloud Eigenvector; According to the spatial coordinates of the three-dimensional sub-point cloud and the feature vector of the three-dimensional sub-point cloud, the feature matrix of the three-dimensional sub-point cloud is obtained.
- detecting the target object to obtain the detection result includes: sampling the three-dimensional sub-point cloud included in the first candidate area to obtain the first candidate area.
- the feature fusion result of the candidate region; the feature fusion result of the first candidate region is used as the detection result of the first candidate region.
- obtaining the attention feature vector of the second sampling point included in the first candidate area includes: according to the The feature matrix of the three-dimensional sub-point cloud included in the first candidate area, feature extraction is performed on the second sampling point to obtain the initial feature vector of the second sampling point; the initial feature vector of the second sampling point is averaged Pooling to obtain the global feature vector of the first candidate area; splicing the initial feature vector of the second sampling point with the global feature vector to obtain the extended feature vector of the second sampling point; according to the The expanded feature vector of the second sampling point to obtain the attention coefficient of the second sampling point; the attention coefficient of the second sampling point is multiplied by the initial feature vector of the second sampling point to obtain the The attention feature vector of the second sampling point.
- a detection device for a target object including:
- the feature extraction module is used to perform feature extraction on the three-dimensional point cloud of the target scene to obtain the feature extraction result;
- the candidate region determination module is used to perform the category prediction and position of the target object on the three-dimensional point cloud according to the feature extraction result Prediction is to determine at least one candidate area of the target object in the target scene;
- the detection module is used to detect the target object in at least one of the candidate areas to obtain a detection result.
- the feature extraction module is configured to: sample the three-dimensional point cloud to obtain a first sampling point; construct a centered point in the three-dimensional point cloud Sampling area; performing feature extraction on the sampling area to obtain the feature vector of the sampling area; determining the feature vector of the three-dimensional point included in the three-dimensional point cloud according to the feature vector of the sampling area, as the feature extraction result .
- the candidate region determining module is configured to: perform a category prediction of a target object on the three-dimensional point cloud according to the feature extraction result to obtain a category prediction result, wherein the category prediction result It is used to indicate the category of the target object to which the three-dimensional points included in the three-dimensional point cloud belong; and perform a position prediction of the target object on the three-dimensional point cloud according to the feature extraction result to obtain a position prediction result, wherein the position prediction The result is used to indicate the position of the three-dimensional point where the target object is located in the three-dimensional point cloud; according to the category prediction result and the position prediction result, at least one candidate area including the target object in the scene is determined.
- the candidate region determining module is further configured to: process the feature extraction result through a category prediction convolutional network to obtain the category of the target object to which the three-dimensional points included in the three-dimensional point cloud belong .
- the candidate region determining module is further configured to: process the feature extraction result through a position prediction convolutional network to obtain three-dimensional points included in the three-dimensional point cloud and at least one preset detection The residual amount between frames, wherein the number of the preset detection frame is not less than one; according to the residual amount, at least one detection frame matched by the three-dimensional point is obtained as the position prediction result.
- the position prediction convolutional network is trained by training data, and the training data includes a three-dimensional point cloud sample, the first position of the sample object in the three-dimensional point cloud sample, and the sample The first feature vector corresponding to the category of the object, the training includes: predicting the convolutional network based on the three-dimensional point cloud sample and the initial position to obtain a first position prediction result; according to the first position prediction result and the first position prediction result The first error loss is obtained according to the error between the positions; according to the distance between the feature vector of the three-dimensional point included in the three-dimensional point cloud sample and the first feature vector, the second error loss is obtained; according to the first The error loss and/or the second error loss are used to train the initial position prediction convolutional network.
- the candidate region determining module is further configured to: obtain at least one detection frame included in the position prediction result; obtain the category prediction results of the three-dimensional points included in the detection frame respectively.
- the prediction score of at least one detection frame; the detection frame with the prediction score greater than the score threshold is used as the candidate area of the target object.
- the candidate area determination module is further configured to: determine a three-dimensional sub-point cloud composed of three-dimensional points included in the at least one candidate area; and obtain the three-dimensional sub-point
- the coordinates of the three-dimensional points included in the cloud are used as the spatial coordinates of the three-dimensional sub-point cloud; the feature vector of the three-dimensional point included in the three-dimensional sub-point cloud is obtained as the feature vector of the three-dimensional sub-point cloud; according to the three-dimensional sub-point cloud
- the spatial coordinates of the point cloud and the feature vector of the three-dimensional sub-point cloud are used to obtain the feature matrix of the three-dimensional sub-point cloud.
- the detection module is configured to: sample the three-dimensional sub-point cloud included in the first candidate area to obtain the second sampling point included in the first candidate area, wherein the first The candidate area is any one of the at least one candidate area; according to the feature matrix of the three-dimensional sub-point cloud included in the first candidate area, the attention feature of the second sampling point included in the first candidate area is acquired Vector; through the fusion convolutional network, the attention feature vector of the second sampling point included in the first candidate region is fused to obtain the feature fusion result of the first candidate region; the feature of the first candidate region The fusion result is used as the detection result of the first candidate region.
- the detection module is further configured to: perform feature extraction on the second sampling point according to the feature matrix of the three-dimensional sub-point cloud included in the first candidate area to obtain the second The initial feature vector of the sampling point; average pooling the initial feature vector of the second sampling point to obtain the global feature vector of the first candidate region; compare the initial feature vector of the second sampling point with the global feature vector The feature vectors are spliced to obtain the extended feature vector of the second sampling point; the attention coefficient of the second sampling point is obtained according to the extended feature vector of the second sampling point; the attention coefficient of the second sampling point is obtained The force coefficient is multiplied by the initial feature vector of the second sampling point to obtain the attention feature vector of the second sampling point.
- an electronic device including:
- a memory for storing processor executable instructions
- the processor is configured to execute the above-mentioned detection method of the target object.
- a computer-readable storage medium having computer program instructions stored thereon, and when the computer program instructions are executed by a processor, the foregoing method for detecting a target object is realized.
- a computer program including computer-readable code, and when the computer-readable code runs in an electronic device, a processor in the electronic device executes the above-mentioned method for detecting a target object.
- the feature extraction result is obtained by performing feature extraction on the three-dimensional point cloud of the target scene, and then according to the feature extraction result, the category prediction and position prediction of the target object are performed on the three-dimensional point cloud to determine the target object’s At least one candidate area, and detecting the target object in the at least one candidate area to obtain a detection result.
- the accuracy of the detection result can be improved, and on the other hand, multiple or multiple different target objects can be included in the scene. Detecting these target objects through the same detection method instead of the model comparison method improves the convenience and efficiency of target detection, and can also further improve the accuracy of target detection.
- Fig. 1 shows a flowchart of a method for detecting a target object according to an embodiment of the present disclosure.
- Fig. 2 shows a block diagram of a device for detecting a target object according to an embodiment of the present disclosure.
- Fig. 3 shows a schematic diagram of an application example according to the present disclosure.
- Fig. 4 shows a schematic diagram of an application example according to the present disclosure.
- Fig. 5 shows a schematic diagram of an application example according to the present disclosure.
- Fig. 6 shows a block diagram of an electronic device according to an embodiment of the present disclosure.
- Fig. 7 shows a block diagram of an electronic device according to an embodiment of the present disclosure.
- Fig. 1 shows a flowchart of a method for detecting a target object according to an embodiment of the present disclosure.
- the method may be applied to a terminal device, a server, or other processing devices.
- terminal devices can be User Equipment (UE), mobile devices, user terminals, terminals, cellular phones, cordless phones, personal digital assistants (PDAs), handheld devices, computing devices, vehicle-mounted devices, and portable devices. Wearable equipment, etc.
- the target object detection method can be applied to chip devices such as artificial intelligence processors.
- the detection method of the target object may also be implemented in a manner in which the processor invokes computer-readable instructions stored in the memory.
- the detection method of the target object may include:
- Step S11 Perform feature extraction on the three-dimensional point cloud of the target scene to obtain a feature extraction result.
- Step S12 Perform category prediction and position prediction of the target object on the three-dimensional point cloud according to the feature extraction result, and determine at least one candidate area of the target object in the target scene.
- Step S13 Detect the target object in at least one candidate area to obtain a detection result.
- the three-dimensional point cloud may include a point set composed of a plurality of three-dimensional points, and the number of three-dimensional points constituting the three-dimensional point cloud is not limited and can be flexibly determined according to actual conditions.
- a three-dimensional point can be a three-dimensional point in space defined by three-dimensional coordinates. The definition of specific three-dimensional coordinates is not limited. In one example, the three-dimensional coordinates can be coordinates composed of three dimensions of x, y, and z. .
- the target scene may be a scene with a requirement for detecting a target object
- the target object may be any object with a requirement for detection, which is flexibly determined according to the actual situation of the detection.
- the target object in indoor object detection, can be an indoor object, such as a sofa, table, or chair.
- the target object in pedestrian detection, can be a pedestrian.
- face recognition the target object can be a human face.
- recognition the target object can be a motor vehicle license plate, etc.; the scene can be any scene with target detection requirements, which can be flexibly determined according to the target object and the actual needs of detection.
- the scene can be an indoor space, such as bedroom space, home space, or classroom space.
- the scene when the target object is a pedestrian, can be a road scene containing pedestrians. In one example, when the target object is a human face When, the scene can be a scene where multiple people exist, such as a classroom, a square, or a movie theater. In one example, when the target object is a motor vehicle license plate, the scene can be a motor vehicle lane, etc., which is not limited in the embodiment of the present disclosure. .
- the number of candidate regions in the scene determined according to the result of feature extraction can be flexibly determined according to the actual situation of the target object contained in the scene.
- the obtained result of the target object may also be determined according to actual conditions, that is, at least one candidate area may include one or one target object, or may include multiple or more target objects.
- a candidate area may contain multiple target objects, that is, multiple target objects may correspond to one candidate area, or one target object may correspond to multiple candidate areas, that is, the three-dimensional point cloud may contain multiple target objects.
- the same target object is located in a number of different candidate regions.
- the feature extraction process and the target object detection process mentioned in the above-mentioned disclosed embodiment can both be implemented through a trained neural network. Which neural network is used and how to achieve the corresponding The process of feature extraction and target object detection can be implemented flexibly according to actual conditions, which will be described in detail in the subsequent disclosed embodiments, and will not be expanded here.
- At least one object containing the target object can be determined from the target scene through category prediction combined with position prediction.
- Candidate area which enables the candidate area to be determined based on the location and category of the target object at the same time, with higher accuracy.
- the target object can be detected in at least one candidate area to obtain the detection result, which can improve the detection result on the one hand.
- the method of obtaining the three-dimensional point cloud of the target scene in the above disclosed embodiment is not limited. Any method that can obtain the three-dimensional point cloud of the scene where the target object is located and determine the coordinates of these three-dimensional point clouds can be used as the method of obtaining the three-dimensional point cloud. It is not limited by the following disclosed embodiments.
- the way to obtain the three-dimensional point cloud may be: scanning the scene that requires target detection through a terminal device, such as the user equipment, mobile terminal, or user terminal mentioned in the above-mentioned disclosed embodiment. , So as to obtain the three-dimensional point cloud included in the scene where the target object is located, and establish a corresponding coordinate system in the scene, so as to obtain the coordinates of these three-dimensional point clouds in the established coordinate system.
- step S11 may be used to perform feature extraction on these three-dimensional point clouds to obtain a feature extraction result.
- the specific feature extraction method is not limited in the embodiment of the present disclosure.
- step S11 may include:
- Step S111 sampling the three-dimensional point cloud to obtain the first sampling point.
- Step S112 Construct a sampling area centered on the first sampling point in the three-dimensional point cloud.
- Step S113 Perform feature extraction on the sampling area to obtain a feature vector of the sampling area.
- Step S114 according to the feature vector of the sampling area, determine the feature vector of the three-dimensional point included in the three-dimensional point cloud as a feature extraction result.
- the three-dimensional point cloud can be divided into multiple sampling areas, and then the feature extraction of the entire three-dimensional point cloud can be obtained according to the feature extraction result of at least one sampling area
- how to divide the sampling area and the number of divided sampling areas can be flexibly determined according to the actual situation.
- the way to divide the three-dimensional point cloud into multiple sampling areas can be to first select the first sampling point from the three-dimensional point cloud, and then based on these first sampling points. The sampling point gets the sampling area.
- the method for selecting the first sampling point is not limited.
- the sampling layer that uses the Farthest Point Sampling (FPS) algorithm can be used in the three-dimensional point cloud.
- the process of determining the first sampling point through the FPS algorithm can be: randomly selecting a point from the three-dimensional point cloud as the random sampling point, and then selecting the point farthest from the selected random sampling point as the random sampling point Start point, iterate continuously, select the point furthest from the sum of the distances of all selected first sampling points each time, until the number of selected first sampling points reaches the threshold, end the selection of the first sampling point .
- the threshold of the number of first sampling points can be set according to actual conditions, and is not limited in the embodiment of the present disclosure.
- step S112 may be used to construct at least one sampling area centered on the first sampling point in the three-dimensional point cloud.
- step S112 may be implemented by a grouping layer.
- the process of constructing a sampling area through the aggregation layer may be: taking the first sampling point as the center, Select adjacent points around the sampling points to construct local areas, and use these local areas as sampling areas.
- the neighboring point may be a three-dimensional point in the three-dimensional point cloud whose distance from the first sampling point is within a distance threshold.
- the specific distance threshold setting can also be flexibly selected according to actual conditions, which is not limited in the embodiments of the present disclosure.
- step S113 can be used to obtain the feature vector of at least one sampling region.
- the implementation of step S113 is not limited, that is, the method of extracting features of the sampling region is not limited.
- the feature vector can be obtained through the point cloud feature extraction layer (Pointnet Layer).
- Pointnet Layer point cloud feature extraction layer
- the implementation mode of the point cloud feature extraction layer can be flexibly determined according to the actual situation.
- the multi-layer perceptron can be (MLP, Multi-Layer Perceptron) is used as the realization of the point cloud feature layer to extract the feature vector of the sampling area.
- step S114 After the feature vectors of at least one sampling area are obtained, the feature vectors of the three-dimensional points in the three-dimensional point cloud can be respectively obtained through step S114 based on these feature vectors.
- the implementation manner of step S114 is not limited, and in a possible implementation manner, it may be implemented through an upsampling layer (Upsampling Layer).
- the method of obtaining the feature vector of the three-dimensional point in the three-dimensional point cloud by using the up-sampling layer may be: in at least one sampling area, according to the spatial position of the three-dimensional point contained in the sampling area in the sampling area, through interpolation
- the method realizes upsampling to obtain the interpolation calculation result, and combines the interpolation calculation result with the feature vector of the sampling area to obtain the feature vector of the three-dimensional point in the sampling area. Since the sampling area is a divided area in the three-dimensional point cloud, Therefore, after the feature vector of the three-dimensional point contained in at least one sampling area is obtained, the feature vector of the three-dimensional point contained in the three-dimensional point cloud can be obtained.
- the specific implementation manner of interpolation calculation is not limited. In one example, bilinear interpolation may be used to implement interpolation calculation.
- the feature extraction process of 3D points can be transformed into the feature extraction process realized by the 3D target feature learning processing mechanism, that is, the feature extraction process of 3D points can be transformed into batches realized by feature extraction layer or feature extraction network
- the feature extraction process greatly improves the efficiency of feature extraction, which in turn improves the efficiency of the target detection process.
- step S12 may include:
- Step S121 Perform category prediction of the target object on the three-dimensional point cloud according to the feature extraction result to obtain the category prediction result, where the category prediction result is used to indicate the category of the target object to which the three-dimensional points included in the three-dimensional point cloud belong.
- Step S122 Perform a position prediction of the target object on the three-dimensional point cloud according to the feature extraction result to obtain a position prediction result, where the position prediction result is used to indicate the position of the three-dimensional point where the target object is located in the three-dimensional point cloud.
- Step S123 Determine at least one candidate area in the scene that includes the target object according to the category prediction result and the position prediction result.
- category prediction can be to predict the category to which the target object belongs.
- the target object it may be divided into multiple categories.
- the target object can be classified according to its category. The difference is divided into: tables, chairs, sofas, air conditioners or other types of indoor objects.
- category prediction can also be to predict the attributes of the target object. For a target object, it may be further divided into multiple attributes.
- the target object is a pedestrian
- the target object can be divided into categories according to its state: walking pedestrian, standing pedestrian or pedestrian in other states; it can also be divided into: pedestrian wearing a hat, wearing sneakers according to their wearing characteristics
- the target object can also be divided into categories: happy, sad, laughing, or crying according to its label
- the target object is a motor vehicle license plate
- it can also be further divided into categories such as car license plates, motorcycle license plates or other license plates.
- the categories included in the specific category prediction and the basis for division can be flexibly determined according to actual conditions, and are not limited in the embodiments of the present disclosure.
- the location prediction can be to predict the location of the target object in the 3D point cloud.
- it can include two aspects. On the one hand, it can be the coordinates of the target object in the 3D point cloud. That is, where the target object is located in the three-dimensional point cloud, on the other hand, it can also include the size of the target object, that is, the coverage area of the target object in the three-dimensional point cloud.
- predicting the position of the target object may be predicting which three-dimensional points in the three-dimensional point cloud are located within the coverage of the target object.
- the category prediction and position prediction of the three-dimensional point cloud are not limited in the order of their realization.
- the two can be carried out separately or at the same time.
- the order of the two There is no restriction, and you can choose flexibly according to the actual situation.
- the category prediction and location prediction of the three-dimensional point cloud can be performed according to the feature vector of the three-dimensional point in the three-dimensional point cloud, and then the category prediction and location prediction are performed according to the category prediction and location prediction.
- category prediction and location prediction can be achieved through convolutional neural networks, etc., through the above configuration, the process of target detection can be transformed into a process achieved through a neural network model, which is similar to the way of matching through separate modeling. It can greatly improve the efficiency and accuracy of target detection.
- the candidate area is determined by the results of category detection and position detection
- the feature vector of the target object determined by the candidate area can be regarded as the feature vector between different categories, that is, the candidate in the embodiment of the present disclosure
- the feature representation of the target object in the area can be regarded as the feature representation of different types of targets learned by using the feature vector between classes.
- it can control the neural network to learn high-dimensional feature representations of different types of targets, which can better Extracting the target features in the 3D point cloud, on the other hand, can make the final target detection result can contain multiple types of targets, that is, batch and multiple types of target detection can be achieved for multiple target objects in the scene at the same time, which greatly Improve the efficiency of target detection.
- step S121 may include:
- the feature extraction results are processed through the category prediction convolutional network to obtain the category prediction results of the three-dimensional points included in the three-dimensional point cloud.
- the category prediction convolutional network can be used to predict that the three-dimensional points contained in the three-dimensional point cloud belong to a certain category of the target object.
- the implementation of the category prediction convolutional network is not limited. Any neural network that can predict the category of a three-dimensional point can be used as an implementation form of the category prediction convolutional network.
- the category prediction network can be implemented through multiple category prediction branches, each category prediction branch can be used to predict a category of the target object contained in the 3D point cloud, and output 3D points belonging to this
- the probability of the category and the number of specific category prediction branches are not limited in the embodiments of the present disclosure, and can be flexibly determined according to actual conditions.
- each category prediction branch is also not limited.
- one-dimensional convolution can be used as the implementation form of each category branch, and the coordinates and feature vectors of the three-dimensional points in the three-dimensional point cloud are used as input.
- a category branch implemented by one-dimensional convolution can obtain the probability that a three-dimensional point in a three-dimensional point cloud belongs to at least one category.
- the category prediction convolution network is used to obtain the category prediction results of the three-dimensional points included in the three-dimensional point cloud.
- the neural network can be used to realize the category prediction of the three-dimensional point cloud, which greatly improves the simplicity and reliability of category prediction, and is suitable for batches Operation improves the efficiency of category prediction, which in turn improves the efficiency of the target detection process.
- step S122 may include:
- Step S1221 The feature extraction result is processed through the position prediction convolutional network to obtain the residual amount between the three-dimensional point included in the three-dimensional point cloud and at least one preset detection frame, wherein the number of the preset detection frame is not less than one .
- step S1222 at least one detection frame matched by the three-dimensional point is obtained according to the residual amount as a position prediction result.
- the position prediction convolutional network may be a neural network used to predict the degree of matching between the three-dimensional point in the three-dimensional point cloud and the preset detection frame.
- the implementation method is not limited and can be flexibly determined according to actual conditions.
- the preset detection frame can be an anchor frame defined according to requirements. Because of the target detection of the 3D point cloud, it can be to detect whether a certain target or some targets are included in the 3D point cloud. Therefore, it can be based on these included For the actual situation of the target, an anchor point frame with a size and shape that matches the target object is set in advance as the preset detection frame.
- the coordinates and feature vectors of the three-dimensional points in the three-dimensional point cloud are used as input through the position prediction convolutional network .
- target detection can detect whether one or more of A different target objects are included in the three-dimensional point cloud.
- you can first preset A different preset detection frame, and define the size of the A preset detection frame according to the actual situation of the A different target object.
- the defined size can be the same or different, and it can be flexibly determined according to the actual situation.
- the target object can be further divided into multiple categories. Therefore, the preset detection frame can be set to B dimensions according to the number B of the category of the target object. In an example , The target object may be divided into seven categories.
- the preset detection frame can be set to the seven dimensions (x, y, z, h, w, l, ry), where x, y, and z can be Respectively represent the spatial coordinates of the center point of the detection frame in the three dimensions of x, y and z.
- h, w and l can respectively represent the height, width and length of the target object corresponding to the detection frame, and ry can represent the corresponding detection frame The rotation angle of the target object under the z axis.
- the 3D point matches one or some of the preset detection frames, and then the size and position of the preset detection frame can be determined from the 3D point cloud according to the matching relationship between the 3D point and the preset detection frame. After correction, at least one detection frame matching the three-dimensional point is obtained as the position prediction result, which is used in the process of determining the candidate area in step S133.
- the position prediction convolutional network By passing the feature extraction result through the position prediction convolutional network, the residual amount between the three-dimensional point included in the three-dimensional point cloud and at least one preset detection frame is obtained, and the residual amount is further determined in the three-dimensional point cloud according to this residual amount, and The detection frame matched by the three-dimensional point is used as the position prediction detection result.
- the size and dimension of the detection frame can be set according to the category of the target object, so that the detection frame determined by the location prediction convolutional network can have the category and location of the target object, and have more accurate detection results. Thereby, the accuracy of the determined candidate area is improved, and then the accuracy of target detection is improved.
- the position prediction result of the target object contained in the three-dimensional point cloud can be obtained through the position prediction convolution network.
- the position prediction convolution network may be a neural network, which Can be trained through training data.
- the specific training process can be flexibly determined according to the actual situation.
- the position prediction convolutional network can be trained through training data.
- the training data can include three-dimensional point cloud samples and the first of the sample objects in the three-dimensional point cloud samples.
- a position and at least one first feature vector corresponding to the category of the sample object, the training process may include:
- the first position prediction result is obtained.
- the first error loss is obtained.
- the second error loss is obtained.
- the initial position prediction convolutional network is trained.
- the initial position prediction convolutional network can be the initial form of the position prediction convolutional network, and the three-dimensional point cloud sample can be input into the initial position prediction convolutional network for training the initial position prediction convolutional network.
- multiple known three-dimensional point clouds The sample object may be an object contained in the three-dimensional point cloud sample, and its implementation form can refer to the implementation form of the above-mentioned target object, which will not be repeated here.
- the first position may be the actual position of the sample object contained in the three-dimensional point cloud sample in the three-dimensional point cloud sample.
- the at least one first feature vector corresponding to the category of the sample object can be a defined feature vector used for initial position prediction convolutional network learning.
- the definition method can be flexibly determined according to the actual situation.
- the first feature vector may have a one-to-one correspondence with the category of the sample object, that is, according to the category type to which the sample object belongs, a feature vector used for learning and training may be defined for at least one category of target objects.
- the error loss of the initial position prediction convolutional network can be determined according to the result obtained by passing the three-dimensional point cloud sample through the initial prediction convolutional network.
- the parameters of the initial position prediction convolutional network are adjusted to obtain a more accurate position prediction convolutional network.
- the error loss may include a first error loss and a second error loss, where the first error loss may be the prediction of the three-dimensional point cloud sample through the initial position
- the second error loss can be the feature vector of the training three-dimensional point in the three-dimensional point cloud sample ,
- the distance between the first eigenvectors corresponding to the class of the sample object together constitutes the error loss.
- the first error loss and the second error loss can be used as the error loss to train the initial position prediction convolutional network. In a possible implementation, only one of them can be considered. A certain error loss is used for training, and it can be flexibly selected according to the actual situation.
- the inter-class feature vectors between the categories of different sample objects in the training data can be fully utilized, so that the trained location prediction convolutional network can learn the feature representations of different categories of targets, so that the location prediction convolution The network can better extract target features in the three-dimensional point cloud, and obtain more accurate position prediction results, thereby improving the accuracy of subsequent target detection.
- this training method can be implemented in an end-to-end manner, so that the result of position prediction is more accurate, and various influencing factors can be better optimized.
- step S123 can be used to determine at least one candidate area in the three-dimensional point cloud based on the category prediction result and the position prediction result, namely Determine at least one candidate area containing the target object in the three-dimensional point cloud.
- step S123 may include:
- Step S1231 Acquire at least one detection frame included in the location prediction result.
- Step S1232 Obtain a prediction score of at least one detection frame according to the category prediction result of the three-dimensional points included in the detection frame.
- step S1233 a detection frame with a predicted score greater than a score threshold is used as at least one initial candidate region of the target object.
- step S123 the candidate can be determined further based on these detection frames. area.
- the detection frame can be further used to determine what kind of detection frame contains target.
- the prediction score of at least one detection frame can be obtained first according to the category prediction result of the three-dimensional points included in the detection frame, that is, according to the probability of the three-dimensional point in the detection frame under at least one category, respectively.
- the score of the detection frame in at least one category is calculated, and the specific score calculation rule can be flexibly set according to the actual situation, which is not limited in the embodiment of the present disclosure.
- the prediction score is greater than the score threshold of a certain category, it can be considered that the detection frame contains the target object of that category, otherwise it is considered that the target object contained in the detection frame does not belong to the current predicted category, and the prediction score is compared with the score threshold.
- at least one detection frame can be determined from the three-dimensional point cloud as a candidate area.
- step S1234 may also be used to delete the repeated detection frames in the determined candidate area, where the repeated detection
- the frame can be a completely coincident detection frame, or a detection frame with a coincidence degree higher than the set coincidence degree threshold.
- the specific coincidence degree threshold value can be flexibly set according to the actual situation, and there is no limitation in the embodiment of the present disclosure. .
- How to detect and delete duplicate detection frames is not limited.
- the non-maximum suppression (NMS) method can be used to remove the duplicate detection frames.
- the final detection frame is obtained as the candidate area of the target object.
- the results of category prediction and location prediction can be fully combined, so that the determined candidate area can not only express the location of the target object, but also indicate the identity of the target object, and further target based on this candidate area
- the detection result can have higher accuracy.
- step S13 the point cloud set of three-dimensional points located in the candidate area can be screened out, and the spatial coordinates and feature vectors of the candidate point cloud set in the candidate area can be obtained.
- the specific determination and acquisition methods are not limited. Therefore, in a possible implementation manner, it may further include before step S13:
- a three-dimensional sub-point cloud composed of three-dimensional points included in at least one candidate area is determined.
- the feature vector of the three-dimensional point included in the three-dimensional sub-point cloud is acquired as the feature vector of the three-dimensional sub-point cloud.
- the feature matrix of the three-dimensional sub-point cloud is obtained.
- the candidate area is an area selected from the three-dimensional point cloud
- the candidate area is located in the three-dimensional point cloud. Therefore, the point cloud set formed by the three-dimensional points included in the candidate area can be used as the three-dimensional sub-points in the above disclosed embodiment cloud.
- the coordinates and feature vectors of the three-dimensional points in the three-dimensional point cloud are known
- the coordinates and feature vectors of the three-dimensional points in the three-dimensional sub-point cloud are known, so it is convenient to determine the spatial coordinates and features of the three-dimensional sub-point cloud Vector, and express these space coordinates and feature vectors in the form of a matrix to form the feature matrix of the three-dimensional sub-point cloud.
- the feature matrix of the candidate area can be further determined, which makes sufficient preparations for subsequent target detection based on the candidate area, and ensures the smooth realization of the target detection process.
- step S13 may be used to detect the target object according to the determined candidate area.
- the specific detection process can be flexibly determined according to the actual situation.
- step S13 can include:
- Step S131 sampling the three-dimensional sub-point cloud included in the first candidate area to obtain a second sampling point included in the first candidate area, where the first candidate area is any one of the at least one candidate area.
- Step S132 according to the feature matrix of the three-dimensional sub-point cloud included in the first candidate region, obtain the attention feature vector of the second sampling point included in the first candidate region.
- step S133 the attention feature vector of the second sampling point included in the first candidate area is fused through the fusion convolutional network to obtain the feature fusion result of the first candidate area.
- Step S134 Use the feature fusion result of the first candidate region as the detection result of the first candidate region.
- the three-dimensional sub-point cloud is mentioned in the above-mentioned disclosed embodiment, and is a sub-point cloud jointly formed by three-dimensional points included in the candidate area, which will not be repeated here.
- the second sampling point may be a sampling point obtained by sampling at least one candidate area.
- the "first" and “second” of the first sampling point and the second sampling point It is only used to distinguish the different sampling objects of the sampling point, that is, the first sampling point is the sampling point obtained by sampling the three-dimensional point cloud, and the second sampling point is the sampling point obtained by sampling the three-dimensional sub-point cloud, instead of limiting the second
- the sampling method of the other that is, the sampling method of the first sampling point and the second sampling point can be the same or different.
- the first candidate area may be one or some candidate areas included in the candidate area obtained in the above disclosed embodiment.
- at least one of the obtained candidate areas may be used as the first candidate area respectively, thereby The detection results corresponding to at least one candidate area are respectively obtained.
- the candidate area in the process of performing target detection on the candidate area, can be further sampled to obtain at least one second sampling point, and based on this first sampling point.
- the attention feature vector of two sampling points is used to obtain the feature fusion result of the attention feature vector of the candidate area as the detection result of the target detection in the candidate area.
- the attention mechanism can be used to process the point cloud features in the candidate area, thereby suppressing the influence of interference point features outside the target on the detection result, thereby improving the accuracy of target detection.
- the process of sampling the three-dimensional sub-point cloud included in the first candidate area to obtain the second sampling point may be the same as the process of sampling the three-dimensional point cloud to obtain the first sampling point. Go into details again.
- step S132 may include:
- Step S1321 Perform feature extraction on the second sampling point according to the feature matrix of the three-dimensional sub-point cloud included in the first candidate area to obtain an initial feature vector of the second sampling point.
- Step S1322 Perform average pooling on the initial feature vector of the second sampling point to obtain the global feature vector of the first candidate area.
- Step S1323 splicing the initial feature vector of the second sampling point with the global feature vector to obtain the extended feature vector of the second sampling point.
- Step S1324 Obtain the attention coefficient of the second sampling point according to the expanded feature vector of the second sampling point.
- Step S1325 Multiply the attention coefficient of the second sampling point with the initial feature vector of the second sampling point to obtain the attention feature vector of the second sampling point.
- the process of obtaining the attention feature vector of the second sampling point may be: first perform feature extraction on the second sampling point to obtain its initial feature vector,
- the extraction process can be referred to the above disclosed embodiments, which will not be repeated here.
- the feature matrix of the three-dimensional sub-point cloud included in the candidate area can be obtained while the candidate area is determined. Therefore, in In a possible implementation manner, the feature vector corresponding to the second sampling point can also be extracted from the feature matrix, and used as the initial feature vector of the second sampling point.
- step S1324 can be used to obtain the attention feature of the second sampling point based on this extended feature vector. How to obtain the specific feature vector can be flexibly determined according to actual conditions.
- the extended feature vector of the second sampling point can be passed through MLP to obtain the attention coefficient of the second sampling point. In this way, the attention coefficient of the second sampling point is compared with the second sampling point. The initial feature vector itself is multiplied, and the resulting feature vector can be regarded as the attention feature vector of the second sampling point.
- the attention feature vector of the second sampling point can be obtained relatively conveniently, and then the detection result of the target object is obtained based on the attention feature vector, which improves the convenience and accuracy of the entire target detection process.
- the attention feature vector of the second sampling point included in the first candidate area After the attention feature vector of the second sampling point included in the first candidate area is obtained, the attention feature vector of the second sampling point included in the first candidate area can be respectively fused through the fusion convolution network, and then the feature The fusion result is used as the target detection result of the candidate region. In this way, the target detection results of all the first candidate regions are counted, and the target detection result corresponding to the entire three-dimensional point cloud can be obtained.
- the implementation of the fusion convolutional network is not limited. Any neural network that can obtain the detection result based on the attention feature vector can be used as the implementation form of the fusion convolutional network. In one example, the above can be achieved through the prediction layer The fusion process completes the detection of the target object.
- Fig. 2 shows a block diagram of a detection device for a target object according to an embodiment of the present disclosure.
- the detection device 20 of the target object includes:
- the feature extraction module 21 is used to perform feature extraction on the three-dimensional point cloud of the target scene to obtain a feature extraction result.
- the candidate region determining module 22 is configured to perform category prediction and position prediction of the target object on the three-dimensional point cloud according to the feature extraction result, and determine at least one candidate region of the target object in the target scene.
- the detection module 23 is configured to detect the target object in at least one candidate area to obtain the detection result.
- the feature extraction module is used to: sample the three-dimensional point cloud to obtain at least one first sampling point; construct at least one sampling area centered on the first sampling point in the three-dimensional point cloud; Feature extraction is performed on the sampling area to obtain the feature vector of the sampling area; according to the feature vector of the sampling area, the feature vectors of the three-dimensional points included in the three-dimensional point cloud are respectively determined as the feature extraction result.
- the candidate region determination module is used to: perform the category prediction of the target object on the three-dimensional point cloud according to the feature extraction result to obtain the category prediction result, wherein the category prediction result is used to indicate the three-dimensional point cloud includes The category of the target object to which the 3D point belongs; according to the feature extraction result, the position prediction of the target object is performed on the 3D point cloud to obtain the position prediction result, where the position prediction result is used to indicate the position of the 3D point where the target object is located in the 3D point cloud ; According to the category prediction result and the location prediction result, determine at least one candidate area that includes the target object in the scene.
- the candidate region determination module is further used to process the feature extraction result through the category prediction convolutional network to obtain the category of the target object to which the three-dimensional points included in the three-dimensional point cloud belong.
- the candidate region determination module is further configured to: process the feature extraction results through the position prediction convolutional network to obtain the residuals between the three-dimensional points included in the three-dimensional point cloud and the at least one preset detection frame The number of preset detection frames is not less than one; according to the residual amount, at least one detection frame matched by the three-dimensional point is obtained as the position prediction result.
- the position prediction convolutional network is trained by training data
- the training data includes a three-dimensional point cloud sample, the first position of the sample object in the three-dimensional point cloud sample, and the first feature corresponding to the category of the sample object Vector
- training includes: predicting the convolutional network based on the three-dimensional point cloud sample and the initial position to obtain the first position prediction result; according to the error between the first position prediction result and the first position, obtaining the first error loss; according to the three-dimensional point cloud
- the distance between the feature vector of the three-dimensional point included in the sample and the first feature vector obtains the second error loss; according to the first error loss and/or the second error loss, the initial position prediction convolutional network is trained.
- the candidate region determining module is further configured to: obtain at least one detection frame included in the position prediction result; obtain the prediction score of the at least one detection frame according to the category prediction result of the three-dimensional points included in the detection frame; The detection frame whose predicted score is greater than the score threshold is used as the candidate area of the target object.
- the candidate area determination module is further used to: determine a three-dimensional sub-point cloud composed of three-dimensional points included in at least one candidate area; obtain the coordinates of the three-dimensional points included in the three-dimensional sub-point cloud, As the spatial coordinates of the three-dimensional sub-point cloud; obtain the feature vector of the three-dimensional point included in the three-dimensional sub-point cloud as the feature vector of the three-dimensional sub-point cloud; obtain the three-dimensional according to the spatial coordinates of the three-dimensional sub-point cloud and the feature vector of the three-dimensional sub-point cloud The feature matrix of the child point cloud.
- the detection module is configured to: sample the three-dimensional sub-point cloud included in the first candidate area to obtain the second sampling points included in the first candidate area, where the first candidate area is at least one candidate Any one of the candidate regions in the region; according to the feature matrix of the three-dimensional sub-point cloud included in the first candidate region, the attention feature vector of the second sampling point included in the first candidate region is obtained; the first candidate region is merged by the convolutional network The attention feature vectors of the second sampling points included in the region are fused to obtain the feature fusion result of the first candidate region; the feature fusion result of the first candidate region is used as the detection result of the first candidate region.
- the detection module is further configured to: perform feature extraction on the second sampling point according to the feature matrix of the three-dimensional sub-point cloud included in the first candidate area to obtain the initial feature vector of the second sampling point;
- the initial feature vector of the second sampling point is averagely pooled to obtain the global feature vector of the first candidate area;
- the initial feature vector of the second sampling point is spliced with the global feature vector to obtain the extended feature vector of the second sampling point;
- the extended feature vector of the second sampling point to obtain the attention coefficient of the second sampling point;
- the attention coefficient of the second sampling point is multiplied by the initial feature vector of the second sampling point to obtain the attention feature of the second sampling point vector.
- the functions or modules contained in the device provided in the embodiments of the present disclosure can be used to execute the methods described in the above method embodiments.
- the functions or modules contained in the device provided in the embodiments of the present disclosure can be used to execute the methods described in the above method embodiments.
- Figures 3 to 5 show schematic diagrams of an application example according to the present disclosure.
- an embodiment of the present disclosure proposes a method for detecting a target object, and the specific process may be:
- Figure 3 shows the complete process of detecting the target object.
- the feature extraction ie The 3D point cloud feature extraction process based on the inter-class feature vector in Figure 3
- the feature extraction result After the feature extraction result is obtained, it can be based on the feature extraction result.
- perform the location prediction of the target object ie, the location prediction in Figure 3
- perform the category prediction of the target object ie, the category prediction in Figure 3
- determines at least one candidate area of the target object in the target scene and
- the feature vector of the candidate area ie, the joint prediction feature in Figure 3
- the target object in the candidate area can be detected based on the attention mechanism, so as to obtain the detection result of the target object.
- the detection result of the target object may include the location of the target object in the three-dimensional point cloud and the specific category of the target object.
- the process of extracting the feature vector from the three-dimensional point cloud can be achieved through the neural network of feature extraction.
- This feature extraction neural network can be divided into four layers, namely sampling layer, aggregation layer, point cloud feature extraction layer, and upsampling layer.
- the sampling layer can be selected in the input 3D point cloud using FPS algorithm A series of first sampling points, which define the center of the sampling area.
- the basic process of the FPS algorithm is to randomly select a point, and then select the point farthest from this point as the starting point, and then continue to iterate until the desired one is selected Up to the number.
- the aggregation layer can take the first sampling point as the center and use the neighboring points to construct a local area, and then extract features.
- the point cloud feature advance layer can use MLP to perform feature extraction on the sampling area, and the up-sampling layer can use the interpolation method to use the first sampling point to obtain the feature vector of each three-dimensional point in the three-dimensional point cloud.
- the spatial coordinate matrix formed by the spatial coordinates of each three-dimensional point can be represented by d, and some of the three-dimensional points it contains
- the feature matrix formed by the feature vector of the point can be represented by C.
- you can first pass the three-dimensional The 3D points contained in the point cloud are sampled and aggregated. On the one hand, after sampling, the number of 3D points contained in the 3D point cloud can be changed from N to N1. On the other hand, after aggregation, multiple sampling areas can be obtained.
- the number of three-dimensional points contained in the sampling area can be denoted as K.
- feature extraction can be performed on each sampling area to obtain the feature vector of each sampling area, thereby forming the feature matrix C1 of the three-dimensional point cloud, and then After the feature matrix C1 of the three-dimensional point cloud is obtained, the feature vector of each three-dimensional point in each sampling area can be obtained by interpolation, and then the feature vector of each three-dimensional point in the three-dimensional point cloud is obtained.
- the candidate area in the three-dimensional point cloud can be further determined based on these feature vectors.
- Figure 5 shows a way to determine the candidate area. It can be seen that in the process of determining the candidate area, the position prediction and category prediction of the three-dimensional point cloud can be performed according to the coordinates and feature vectors of each three-dimensional point in the three-dimensional point cloud, and the results of the location prediction and the category prediction Combine to effectively determine the candidate area in the 3D point cloud.
- category prediction and location prediction can be implemented through a neural network.
- both category prediction and location prediction branches can be implemented by one-dimensional convolution.
- the number of subsequent output channels of the convolutional network is the number of categories;
- the application example of the present disclosure adopts the anchor method for prediction.
- a anchor size can be defined in advance, and then for each anchor Predict the residual amount of (x, y, z, h, w, l, ry) 7 dimensions (that is, the 7 categories that the target object may correspond to), thereby obtaining a preliminary prediction frame.
- a check box with a score greater than the score threshold can be selected according to the score obtained from the category prediction result of each three-dimensional point contained in it in the category branch, and then the NMS post-processing is performed to obtain The final candidate area.
- a three-dimensional point cloud subset in the spatial area can be further screened out as a three-dimensional sub-point cloud, and the spatial coordinates and feature vectors of the three-dimensional sub-point cloud form the feature matrix of the candidate area.
- a learnable feature vector in the process of training the position prediction neural network, can be defined for each target object category, and the feature vector of each three-dimensional point in the training data and the corresponding target object category can be calculated
- the distance of the learnable feature vector, the calculated distance is added as a penalty term (ie error loss) to the network training process, that is, during the training process of the position prediction neural network, the three-dimensional point is calculated in the category of each target object
- the following feature vector distance so as to realize the training of the position prediction neural network under the category of each target object.
- the target object in each candidate region can be detected based on the feature matrix of each candidate region obtained in the above-mentioned public application example.
- the three-dimensional sub- For the point cloud the same sampling method as in the above public application example is used to further extract the second sampling point in the candidate area, and obtain its feature vector. Then, using the feature vectors of all the second sampling points, the global feature vector of the candidate area is obtained through the average pooling layer, and the global feature vector is spliced with the feature vector of the second sampling point itself to realize the comparison of the feature vector of the second sampling point. Extension.
- Each second sampling point then uses the expanded feature vector to obtain the corresponding attention coefficient through MLP, and multiplies the attention coefficient with its own feature vector to obtain the attention feature vector of each second sampling point.
- the attention feature vectors of all the second sampling points obtained can be further fused using a convolutional network to predict the category and position of the target object corresponding to each candidate area as the target detection result of the entire 3D point cloud, namely The category and location of each item (ie, target object) contained in the indoor space are predicted as the detection result.
- the target object detection method proposed in the application examples of the present disclosure can not only be applied to indoor object recognition tasks, but also can be applied to other tasks that have target object detection requirements.
- the writing order of the steps does not mean a strict execution order but constitutes any limitation on the implementation process.
- the specific execution order of each step should be based on its function and possibility.
- the inner logic is determined.
- the embodiment of the present disclosure also proposes a computer-readable storage medium on which computer program instructions are stored, and the computer program instructions implement the above-mentioned method when executed by a processor.
- the computer-readable storage medium may be a volatile computer-readable storage medium or a non-volatile computer-readable storage medium.
- An embodiment of the present disclosure also provides an electronic device, including: a processor; a memory for storing executable instructions of the processor; wherein the processor is configured as the above-mentioned method.
- the embodiment of the present disclosure also provides a computer program, including computer readable code, when the computer readable code is executed in an electronic device, the processor in the electronic device is executed to implement the above method.
- the above-mentioned memory may be a volatile memory (volatile memory), such as RAM; or a non-volatile memory (non-volatile memory), such as ROM, flash memory, hard disk drive (Hard Disk Drive) , HDD) or solid-state drive (Solid-State Drive, SSD); or a combination of the above types of memory, and provide instructions and data to the processor.
- volatile memory such as RAM
- non-volatile memory such as ROM, flash memory, hard disk drive (Hard Disk Drive) , HDD) or solid-state drive (Solid-State Drive, SSD); or a combination of the above types of memory, and provide instructions and data to the processor.
- the foregoing processor may be at least one of ASIC, DSP, DSPD, PLD, FPGA, CPU, controller, microcontroller, and microprocessor. It is understandable that, for different devices, the electronic device used to implement the above-mentioned processor function may also be other, and the embodiment of the present disclosure does not specifically limit it.
- the electronic device can be provided as a terminal, server or other form of device.
- the embodiment of the present disclosure also provides a computer program, which implements the foregoing method when the computer program is executed by a processor.
- FIG. 6 is a block diagram of an electronic device 800 according to an embodiment of the present disclosure.
- the electronic device 800 may be a mobile phone, a computer, a digital broadcasting terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, and other terminals.
- the electronic device 800 may include one or more of the following components: a processing component 802, a memory 804, a power supply component 806, a multimedia component 808, an audio component 810, an input/output (I/O) interface 812, and a sensor component 814 , And communication component 816.
- the processing component 802 generally controls the overall operations of the electronic device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations.
- the processing component 802 may include one or more processors 820 to execute instructions to complete all or part of the steps of the foregoing method.
- the processing component 802 may include one or more modules to facilitate the interaction between the processing component 802 and other components.
- the processing component 802 may include a multimedia module to facilitate the interaction between the multimedia component 808 and the processing component 802.
- the memory 804 is configured to store various types of data to support operations in the electronic device 800. Examples of these data include instructions for any application or method to operate on the electronic device 800, contact data, phone book data, messages, pictures, videos, etc.
- the memory 804 can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable and Programmable Read Only Memory (EPROM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), Magnetic Memory, Flash Memory, Magnetic Disk or Optical Disk.
- SRAM static random access memory
- EEPROM electrically erasable programmable read-only memory
- EPROM erasable and Programmable Read Only Memory
- PROM Programmable Read Only Memory
- ROM Read Only Memory
- Magnetic Memory Flash Memory
- Magnetic Disk Magnetic Disk or Optical Disk.
- the power supply component 806 provides power for various components of the electronic device 800.
- the power supply component 806 may include a power management system, one or more power supplies, and other components associated with the generation, management, and distribution of power for the electronic device 800.
- the multimedia component 808 includes a screen that provides an output interface between the electronic device 800 and the user.
- the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from the user.
- the touch panel includes one or more touch sensors to sense touch, sliding, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure related to the touch or slide operation.
- the multimedia component 808 includes a front camera and/or a rear camera. When the electronic device 800 is in an operation mode, such as a shooting mode or a video mode, the front camera and/or the rear camera can receive external multimedia data. Each front camera and rear camera can be a fixed optical lens system or have focal length and optical zoom capabilities.
- the audio component 810 is configured to output and/or input audio signals.
- the audio component 810 includes a microphone (MIC), and when the electronic device 800 is in an operation mode, such as a call mode, a recording mode, and a voice recognition mode, the microphone is configured to receive an external audio signal.
- the received audio signal may be further stored in the memory 804 or transmitted via the communication component 816.
- the audio component 810 further includes a speaker for outputting audio signals.
- the I/O interface 812 provides an interface between the processing component 802 and a peripheral interface module.
- the above-mentioned peripheral interface module may be a keyboard, a click wheel, a button, and the like. These buttons may include, but are not limited to: home button, volume button, start button, and lock button.
- the sensor component 814 includes one or more sensors for providing the electronic device 800 with various aspects of state evaluation.
- the sensor component 814 can detect the on/off status of the electronic device 800 and the relative positioning of the components.
- the component is the display and the keypad of the electronic device 800.
- the sensor component 814 can also detect the electronic device 800 or the electronic device 800.
- the position of the component changes, the presence or absence of contact between the user and the electronic device 800, the orientation or acceleration/deceleration of the electronic device 800, and the temperature change of the electronic device 800.
- the sensor component 814 may include a proximity sensor configured to detect the presence of nearby objects when there is no physical contact.
- the sensor component 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications.
- the sensor component 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
- the communication component 816 is configured to facilitate wired or wireless communication between the electronic device 800 and other devices.
- the electronic device 800 can access a wireless network based on a communication standard, such as WiFi, 2G, or 3G, or a combination thereof.
- the communication component 816 receives a broadcast signal or broadcast related personnel information from an external broadcast management system via a broadcast channel.
- the communication component 816 further includes a near field communication (NFC) module to facilitate short-range communication.
- the NFC module can be implemented based on radio frequency identification (RFID) technology, infrared data association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology and other technologies.
- RFID radio frequency identification
- IrDA infrared data association
- UWB ultra-wideband
- Bluetooth Bluetooth
- the electronic device 800 may be implemented by one or more application-specific integrated circuits (ASIC), digital signal processors (DSP), digital signal processing devices (DSPD), programmable logic devices (PLD), field-available A programmable gate array (FPGA), controller, microcontroller, microprocessor, or other electronic components are implemented to implement the above methods.
- ASIC application-specific integrated circuits
- DSP digital signal processors
- DSPD digital signal processing devices
- PLD programmable logic devices
- FPGA field-available A programmable gate array
- controller microcontroller, microprocessor, or other electronic components are implemented to implement the above methods.
- a non-volatile computer-readable storage medium such as the memory 804 including computer program instructions, which can be executed by the processor 820 of the electronic device 800 to complete the foregoing method.
- FIG. 7 is a block diagram of an electronic device 1900 according to an embodiment of the present disclosure.
- the electronic device 1900 may be provided as a server.
- the electronic device 1900 includes a processing component 1922, which further includes one or more processors, and a memory resource represented by a memory 1932 for storing instructions executable by the processing component 1922, such as application programs.
- the application program stored in the memory 1932 may include one or more modules each corresponding to a set of instructions.
- the processing component 1922 is configured to execute instructions to perform the above-described methods.
- the electronic device 1900 may also include a power supply component 1926 configured to perform power management of the electronic device 1900, a wired or wireless network interface 1950 configured to connect the electronic device 1900 to the network, and an input output (I/O) interface 1958 .
- the electronic device 1900 can operate based on an operating system stored in the memory 1932, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM or the like.
- a non-volatile computer-readable storage medium is also provided, such as the memory 1932 including computer program instructions, which can be executed by the processing component 1922 of the electronic device 1900 to complete the foregoing method.
- the present disclosure may be a system, method and/or computer program product.
- the computer program product may include a computer-readable storage medium loaded with computer-readable program instructions for enabling a processor to implement various aspects of the present disclosure.
- the computer-readable storage medium may be a tangible device that can hold and store instructions used by the instruction execution device.
- the computer-readable storage medium may be, for example, but not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
- Non-exhaustive list of computer-readable storage media include: portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM) Or flash memory), static random access memory (SRAM), portable compact disk read-only memory (CD-ROM), digital versatile disk (DVD), memory stick, floppy disk, mechanical encoding device, such as a printer with instructions stored thereon
- RAM random access memory
- ROM read-only memory
- EPROM erasable programmable read-only memory
- flash memory flash memory
- SRAM static random access memory
- CD-ROM compact disk read-only memory
- DVD digital versatile disk
- memory stick floppy disk
- mechanical encoding device such as a printer with instructions stored thereon
- the computer-readable storage medium used here is not interpreted as the instantaneous signal itself, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (for example, light pulses through fiber optic cables), or through wires Transmission of electrical signals.
- the computer-readable program instructions described herein can be downloaded from a computer-readable storage medium to various computing/processing devices, or downloaded to an external computer or external storage device via a network, such as the Internet, a local area network, a wide area network, and/or a wireless network.
- the network may include copper transmission cables, optical fiber transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers.
- the network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network, and forwards the computer-readable program instructions for storage in the computer-readable storage medium in each computing/processing device .
- the computer program instructions used to perform the operations of the present disclosure may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-related instructions, microcode, firmware instructions, state setting data, or in one or more programming languages.
- Source code or object code written in any combination, the programming language includes object-oriented programming languages such as Smalltalk, C++, etc., and conventional procedural programming languages such as "C" language or similar programming languages.
- Computer-readable program instructions can be executed entirely on the user's computer, partly on the user's computer, executed as a stand-alone software package, partly on the user's computer and partly executed on a remote computer, or entirely on the remote computer or server implement.
- the remote computer can be connected to the user's computer through any kind of network-including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (for example, using an Internet service provider to connect to the user's computer) connect).
- LAN local area network
- WAN wide area network
- an electronic circuit such as a programmable logic circuit, a field programmable gate array (FPGA), or a programmable logic array (PLA), is personalized by using status personnel information of computer-readable program instructions.
- FPGA field programmable gate array
- PDA programmable logic array
- the computer-readable program instructions can be executed to implement various aspects of the present disclosure.
- These computer-readable program instructions can be provided to the processor of a general-purpose computer, a special-purpose computer, or other programmable data processing device, thereby producing a machine that makes these instructions when executed by the processor of the computer or other programmable data processing device , A device that implements the functions/actions specified in one or more blocks in the flowcharts and/or block diagrams is produced. It is also possible to store these computer-readable program instructions in a computer-readable storage medium. These instructions make computers, programmable data processing apparatuses, and/or other devices work in a specific manner, so that the computer-readable medium storing the instructions includes An article of manufacture, which includes instructions for implementing various aspects of the functions/actions specified in one or more blocks in the flowcharts and/or block diagrams.
- each block in the flowchart or block diagram may represent a module, program segment, or part of an instruction, and the module, program segment, or part of an instruction contains one or more components for realizing the specified logical function.
- Executable instructions may also occur in a different order from the order marked in the drawings. For example, two consecutive blocks can actually be executed substantially in parallel, or they can sometimes be executed in the reverse order, depending on the functions involved.
- each block in the block diagram and/or flowchart, and the combination of the blocks in the block diagram and/or flowchart can be implemented by a dedicated hardware-based system that performs the specified functions or actions Or it can be realized by a combination of dedicated hardware and computer instructions.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Quality & Reliability (AREA)
- Image Analysis (AREA)
Abstract
本公开涉及一种目标对象的检测方法、装置、设备和存储介质。所述方法包括:对目标场景的三维点云进行特征提取,得到特征提取结果;根据所述特征提取结果,对所述三维点云进行目标对象的类别预测以及位置预测,确定所述目标场景中的目标对象的至少一个候选区域;在至少一个所述候选区域中,对所述目标对象进行检测,得到检测结果。
Description
本公开涉及计算机视觉技术领域,尤其涉及一种目标对象的检测方法、装置、设备和存储介质。
三维目标检测任务是计算机视觉及智能场景理解领域的重要问题,可以应用在很多重要领域,如在无人驾驶、机器人、增强现实等方面具有重要的研究意义和应用价值。
在进行三维目标检测时,可以将三维点云与目标模型进行匹配,来确定三维点云中是否包含有目标对象。如果三维点云中包含有多个不同的目标对象,可能需要和多个不同的目标模型分别进行匹配,耗费时间长的同时,检测的准确率也会有所降低。
发明内容
本公开提出了一种目标对象的检测方案。
根据本公开的一方面,提供了一种目标对象的检测方法,包括:
对目标场景的三维点云进行特征提取,得到特征提取结果;根据所述特征提取结果,对所述三维点云进行目标对象的类别预测以及位置预测,确定所述目标场景中的目标对象的至少一个候选区域;在至少一个所述候选区域中,对所述目标对象进行检测,得到检测结果。
在一种可能的实现方式中,所述对目标场景的三维点云进行特征提取,得到特征提取结果,包括:对所述三维点云进行采样,得到第一采样点;在所述三维点云中构建以所述第一采样点为中心的采样区域;对所述采样区域进行特征提取,得到所述采样区域的特征向量;根据所述采样区域的特征向量,确定所述三维点云包括的三维点的特征向量,作为所述特征提取结果。
在一种可能的实现方式中,所述根据所述特征提取结果,对所述三维点云进行目标对象的类别预测以及位置预测,确定所述目标场景中的目标对象的至少一个候选区域,包括:根据所述特征提取结果,对所述三维点云进行目标对象的类别预测,得到类别预测结果,其中,所述类别预测结果用于指示所述三维点云包括的三维点所属的目标对象的类别;根据所述特征提取结果,对所述三维点云进行目标对象的位置预测,得到位置预测结果,其中,所述位置预测结果用于指示所述三维点云中目标对象所在的三维点的位置;根据所述类别预测结果和所述位置预测结果,确定所述场景中包括所述目标对象的至少一个候选区域。
在一种可能的实现方式中,所述根据所述特征提取结果,对所述三维点云进行类别预测,得到类别预测结果,包括:将所述特征提取结果通过类别预测卷积网络进行处理,得到所述三维点云包括的三维点所属的目标对象的类别。
在一种可能的实现方式中,所述根据所述特征提取结果,对所述三维点云进行位置预测,得到位置预测结果,包括:将所述特征提取结果通过位置预测卷积网络进行处理,得到所述三维点云包括的三维点与至少一个预设检测框之间的残差量,其中,所述预设检测框的数量不少于一个;根据所述残差量,得到所述三维点匹配的至少一个检测框,作为所述位置预测结果。
在一种可能的实现方式中,所述位置预测卷积网络通过训练数据训练,所述训练数据包括三维点云样本、样本对象在所述三维点云样本中的第一位置以及与所述样本对象的类别对应的第一特征向量,所述训练包括:基于所述三维点云样本和初始位置预测卷积网络,得到第一位置预测结果;根据所述第一位置预测结果与所述第一位置之间的误差,得到第一误差损失;根据所述三维点云样本包括的三维点的特征向量,与所述第一特征向量之间的距离,得到第二误差损失;根据所述第一误差损失和/或第二误差损失,对所述初始位置预测卷积网络进行训练。
在一种可能的实现方式中,所述根据所述类别预测结果和所述位置预测结果,确定所述场景中包括所述目标对象的至少一个候选区域,包括:获取所述位置预测结果包括的至少一个检测框;根据所述检测框包括的三维点的类别预测结果,分别得到所述至少一个检测框的预测分数;将所述预测分 数大于分数阈值的检测框,作为所述目标对象的候选区域。
在一种可能的实现方式中,在所述至少一个候选区域中,对所述目标对象进行检测,得到检测结果之前,还包括:确定所述至少一个候选区域包括的三维点构成的三维子点云;获取所述三维子点云包括的三维点的坐标,作为所述三维子点云的空间坐标;获取所述三维子点云包括的三维点的特征向量,作为所述三维子点云的特征向量;根据所述三维子点云的空间坐标和所述三维子点云的特征向量,得到所述三维子点云的特征矩阵。
在一种可能的实现方式中,在所述至少一个候选区域中,对所述目标对象进行检测,得到检测结果,包括:对第一候选区域包括的三维子点云进行采样,得到所述第一候选区域包括的第二采样点,其中,所述第一候选区域为所述至少一个候选区域中的任一个候选区域;根据所述第一候选区域包括的三维子点云的特征矩阵,获取所述第一候选区域包括的第二采样点的注意力特征向量;通过融合卷积网络,将所述第一候选区域包括的第二采样点的注意力特征向量进行融合,得到所述第一候选区域的特征融合结果;将所述第一候选区域的特征融合结果作为所述第一候选区域的检测结果。
在一种可能的实现方式中,根据所述第一候选区域包括的三维子点云的特征矩阵,获取所述第一候选区域包括的第二采样点的注意力特征向量,包括:根据所述第一候选区域包括的三维子点云的特征矩阵,对所述第二采样点进行特征提取,得到所述第二采样点的初始特征向量;将所述第二采样点的初始特征向量进行平均池化,得到所述第一候选区域的全局特征向量;将所述第二采样点的初始特征向量与所述全局特征向量进行拼接,得到所述第二采样点的扩展特征向量;根据所述第二采样点的扩展特征向量,得到所述第二采样点的注意力系数;将所述第二采样点的注意力系数与所述第二采样点的初始特征向量进行相乘,得到所述第二采样点的注意力特征向量。
根据本公开的一方面,提供了一种目标对象的检测装置,包括:
特征提取模块,用于对目标场景的三维点云进行特征提取,得到特征提取结果;候选区域确定模块,用于根据所述特征提取结果,对所述三维点云进行目标对象的类别预测以及位置预测,确定所述目标场景中的目标对象的至少一个候选区域;检测模块,用于在至少一个所述候选区域中,对所述目标对象进行检测,得到检测结果。
在一种可能的实现方式中,所述特征提取模块用于:对所述三维点云进行采样,得到第一采样点;在所述三维点云中构建以所述第一采样点为中心的采样区域;对所述采样区域进行特征提取,得到所述采样区域的特征向量;根据所述采样区域的特征向量,确定所述三维点云包括的三维点的特征向量,作为所述特征提取结果。
在一种可能的实现方式中,所述候选区域确定模块用于:根据所述特征提取结果,对所述三维点云进行目标对象的类别预测,得到类别预测结果,其中,所述类别预测结果用于指示所述三维点云包括的三维点所属的目标对象的类别;根据所述特征提取结果,对所述三维点云进行目标对象的位置预测,得到位置预测结果,其中,所述位置预测结果用于指示所述三维点云中目标对象所在的三维点的位置;根据所述类别预测结果和所述位置预测结果,确定所述场景中包括所述目标对象的至少一个候选区域。
在一种可能的实现方式中,所述候选区域确定模块进一步用于:将所述特征提取结果通过类别预测卷积网络进行处理,得到所述三维点云包括的三维点所属的目标对象的类别。
在一种可能的实现方式中,所述候选区域确定模块进一步用于:将所述特征提取结果通过位置预测卷积网络进行处理,得到所述三维点云包括的三维点与至少一个预设检测框之间的残差量,其中,所述预设检测框的数量不少于一个;根据所述残差量,得到所述三维点匹配的至少一个检测框,作为所述位置预测结果。
在一种可能的实现方式中,所述位置预测卷积网络通过训练数据训练,所述训练数据包括三维点云样本、样本对象在所述三维点云样本中的第一位置以及与所述样本对象的类别对应的第一特征向量,所述训练包括:基于所述三维点云样本和初始位置预测卷积网络,得到第一位置预测结果;根据所述第一位置预测结果与所述第一位置之间的误差,得到第一误差损失;根据所述三维点云样本包括的三维点的特征向量,与所述第一特征向量之间的距离,得到第二误差损失;根据所述第一误差损失 和/或第二误差损失,对所述初始位置预测卷积网络进行训练。
在一种可能的实现方式中,所述候选区域确定模块进一步用于:获取所述位置预测结果包括的至少一个检测框;根据所述检测框包括的三维点的类别预测结果,分别得到所述至少一个检测框的预测分数;将所述预测分数大于分数阈值的检测框,作为所述目标对象的候选区域。
在一种可能的实现方式中,在所述检测模块之前,所述候选区域确定模块还用于:确定所述至少一个候选区域包括的三维点构成的三维子点云;获取所述三维子点云包括的三维点的坐标,作为所述三维子点云的空间坐标;获取所述三维子点云包括的三维点的特征向量,作为所述三维子点云的特征向量;根据所述三维子点云的空间坐标和所述三维子点云的特征向量,得到所述三维子点云的特征矩阵。
在一种可能的实现方式中,所述检测模块用于:对第一候选区域包括的三维子点云进行采样,得到所述第一候选区域包括的第二采样点,其中,所述第一候选区域为所述至少一个候选区域中的任一个候选区域;根据所述第一候选区域包括的三维子点云的特征矩阵,获取所述第一候选区域包括的第二采样点的注意力特征向量;通过融合卷积网络,将所述第一候选区域包括的第二采样点的注意力特征向量进行融合,得到所述第一候选区域的特征融合结果;将所述第一候选区域的特征融合结果作为所述第一候选区域的检测结果。
在一种可能的实现方式中,所述检测模块进一步用于:根据所述第一候选区域包括的三维子点云的特征矩阵,对所述第二采样点进行特征提取,得到所述第二采样点的初始特征向量;将所述第二采样点的初始特征向量进行平均池化,得到所述第一候选区域的全局特征向量;将所述第二采样点的初始特征向量与所述全局特征向量进行拼接,得到所述第二采样点的扩展特征向量;根据所述第二采样点的扩展特征向量,得到所述第二采样点的注意力系数;将所述第二采样点的注意力系数与所述第二采样点的初始特征向量进行相乘,得到所述第二采样点的注意力特征向量。
根据本公开的一方面,提供了一种电子设备,包括:
处理器;
用于存储处理器可执行指令的存储器;
其中,所述处理器被配置为:执行上述目标对象的检测方法。
根据本公开的一方面,提供了一种计算机可读存储介质,其上存储有计算机程序指令,所述计算机程序指令被处理器执行时实现上述目标对象的检测方法。
根据本公开的一方面,提供了一种计算机程序,包括计算机可读代码,当所述计算机可读代码在电子设备中运行时,所述电子设备中的处理器执行上述目标对象的检测方法。
在本公开实施例中,通过对目标场景的三维点云进行特征提取来得到特征提取结果,继而根据特征提取结果,通过对三维点云进行目标对象的类别预测以及位置预测,从而确定目标对象的至少一个候选区域,并在至少一个候选区域中对目标对象进行检测得到检测结果。通过上述过程,可以基于特征提取结果,通过类别预测结合位置预测从目标场景中确定至少一个包含有目标对象的候选区域,使得候选区域同时基于目标对象的位置和类别来确定,具有更高的准确度,继而可以在各候选区域中均对目标对象进行检测,来得到检测结果,一方面可以提升检测结果的准确性,另一方面也可以在场景中包括有多个或多种不同的目标对象,通过同样的检测方式而非模型比对方式来将这些目标对象检测出来,提升了目标检测的方便程度和效率,也可以进一步提升目标检测的准确程度。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,而非限制本公开。根据下面参考附图对示例性实施例的详细说明,本公开的其它特征及方面将变得清楚。
此处的附图被并入说明书中并构成本说明书的一部分,这些附图示出了符合本公开的实施例,并与说明书一起用于说明本公开的技术方案。
图1示出根据本公开一实施例的目标对象的检测方法的流程图。
图2示出根据本公开一实施例的目标对象的检测装置的框图。
图3示出根据本公开一应用示例的示意图。
图4示出根据本公开一应用示例的示意图。
图5示出根据本公开一应用示例的示意图。
图6示出根据本公开实施例的一种电子设备的框图。
图7示出根据本公开实施例的一种电子设备的框图。
以下将参考附图详细说明本公开的各种示例性实施例、特征和方面。附图中相同的附图标记表示功能相同或相似的元件。尽管在附图中示出了实施例的各种方面,但是除非特别指出,不必按比例绘制附图。
在这里专用的词“示例性”意为“用作例子、实施例或说明性”。这里作为“示例性”所说明的任何实施例不必解释为优于或好于其它实施例。
本文中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中术语“至少一种”表示多种中的任意一种或多种中的至少两种的任意组合,例如,包括A、B、C中的至少一种,可以表示包括从A、B和C构成的集合中选择的任意一个或多个元素。
另外,为了更好地说明本公开,在下文的具体实施方式中给出了众多的具体细节。本领域技术人员应当理解,没有某些具体细节,本公开同样可以实施。在一些实例中,对于本领域技术人员熟知的方法、手段、元件和电路未作详细描述,以便于凸显本公开的主旨。
图1示出根据本公开一实施例的目标对象的检测方法的流程图,该方法可以应用于终端设备、服务器或者其他处理设备等。其中,终端设备可以为用户设备(User Equipment,UE)、移动设备、用户终端、终端、蜂窝电话、无绳电话、个人数字处理(Personal Digital Assistant,PDA)、手持设备、计算设备、车载设备、可穿戴设备等。在一个示例中,该目标对象的检测方法可以应用于人工智能处理器等芯片设备中。
在一些可能的实现方式中,该目标对象的检测方法也可以通过处理器调用存储器中存储的计算机可读指令的方式来实现。
如图1所示,所述目标对象的检测方法可以包括:
步骤S11,对目标场景的三维点云进行特征提取,得到特征提取结果。
步骤S12,根据特征提取结果,对三维点云进行目标对象的类别预测以及位置预测,确定目标场景中的目标对象的至少一个候选区域。
步骤S13,在至少一个候选区域中,对目标对象进行检测,得到检测结果。
其中,三维点云可以包括由多个三维点所共同构成的点集合,构成三维点云的三维点的数量不受限定,可以根据实际情况灵活决定。三维点可以是在空间内,由三维坐标所定义的三维空间点,具体三维坐标的定义方式不受限定,在一个示例中,三维坐标可以是由x、y和z三个维度所构成的坐标。
目标场景可以是有检测目标对象需求的场景,目标对象可以是具有检测需求的任意对象,根据检测的实际情况灵活确定。比如在室内物体检测中,目标对象可以是室内的物体,如沙发、桌子或椅子等,在行人检测中,目标对象可以是行人,在人脸识别中,目标对象可以是人脸,在机动车识别中,目标对象可以是机动车牌照等等;场景则可以是具有目标检测需求的任意场景,根据目标对象和检测的实际需求进行灵活确定,在一个示例中,当目标对象为室内物体时,场景可以为室内空间,如卧室空间、家居空间或是教室空间等,在一个示例中,当目标对象为行人时,场景可以是包含有行人的马路场景,在一个示例中,当目标对象为人脸时,场景可以是有多人存在的场景如教室、广场或是电影院等,在一个示例中,当目标对象是机动车牌照时,场景可以是机动车道等,在本公开实施例中不做限制。
上述公开实施例中,根据特征提取结果确定的场景中的候选区域的数量,可以根据场景中包含 目标对象的实际情况灵活决定,可以为一个,也可以为多个,而至少一个候选区域中检测得到的目标对象的结果,也可以根据实际情况来确定,即至少一个候选区域中可以包括有一个或一种目标对象,也可以包括有多个或多种目标对象。在一种可能的实现方式中,一个候选区域可以包含有多个目标对象,即多个目标对象可以对应一个候选区域,也可以一个目标对象对应多个候选区域,即三维点云中可以包含多个同一目标对象,这一目标对象分别位于多个不同的候选区域中。
在一种可能的实现方式中,上述公开实施例中提到的特征提取过程和对目标对象的检测过程,均可以通过训练好的神经网络来实现,具体采用何种神经网络,如何实现相应的特征提取以及目标对象的检测过程,其实现方式可以根据实际情况灵活选择,在后续各公开实施例中会进行详细说明,在此先不做展开。
通过上述公开实施例可以看出,在一种可能的实现方式中,在对目标对象进行检测时,可以基于特征提取结果,通过类别预测结合位置预测从目标场景中确定至少一个包含有目标对象的候选区域,使得候选区域同时基于目标对象的位置和类别来确定,具有更高的准确度,继而可以在至少一个候选区域中均对目标对象进行检测,来得到检测结果,一方面可以提升检测结果的准确性,另一方面也可以在场景中包括有多个或多种不同的目标对象,通过同样的检测方式而非模型比对方式来将这些目标对象检测出来,提升了目标检测的方便程度和效率,也可以进一步提升目标检测的准确程度。
上述公开实施例中获取目标场景的三维点云的方式不受限定,任何可以获取目标对象所在场景的三维点云,并确定这些三维点云的坐标的方式,均可以作为获取三维点云方式,不受下述公开实施例的限制。在一种可能的实现方式中,获取三维点云的方式可以为:通过终端设备,如上述公开实施例提到的用户设备、移动终端或用户终端等等,对需要进行目标检测的场景进行扫描,从而获取目标对象所在场景包括的三维点云,并在场景中建立对应的坐标系,从而得到这些三维点云在建立的坐标系下的坐标。
在获取目标场景包括的三维点云,并得到相应的三维点的空间坐标后,可以通过步骤S11,来对这些三维点云进行特征提取,得到特征提取结果。具体的特征提取方式在本公开实施例中不做限定,在一种可能的实现方式中,步骤S11可以包括:
步骤S111,对三维点云进行采样,得到第一采样点。
步骤S112,在三维点云中构建以第一采样点为中心的采样区域。
步骤S113,对采样区域进行特征提取,得到采样区域的特征向量。
步骤S114,根据采样区域的特征向量,确定三维点云包括的三维点的特征向量,作为特征提取结果。
通过上述公开实施例可以看出,在一种可能的实现方式中,可以将三维点云划分为多个采样区域,然后根据至少一个采样区域的特征提取结果,来得到整个三维点云的特征提取结果,具体如何划分采样区域,以及划分的采样区域的数量,可以根据实际情况灵活决定。基于步骤S111和步骤S112可以看出,在一种可能的实现方式中,将三维点云划分为多个采样区域的方式可以为先从三维点云中选择第一采样点,然后基于这些第一采样点得到采样区域。第一采样点的选择方式不受限定,在一种可能的实现方式中,可以通过在三维点云中利用采用了最远点采样(FPS,Farthest Point Sampling)算法的采样层(Sampling Layer),来得到至少一个第一采样点。在一个示例中,通过FPS算法确定第一采样点的过程可以为:从三维点云中随机选定一个点作为随机采样点,然后选择离这个被选定的随机采样点最远的点来作为起点,不断迭代,每次都选择距离被选定的所有第一采样点的距离之和最远的点,直到被选定的第一采样点个数达到阈值后,结束第一采样点的选择。其中,第一采样点个数的阈值可以根据实际情况进行设定,在本公开实施例中不做限制。
在确定了第一采样点后,可以通过步骤S112来在三维点云中构建至少一个以第一采样点为中心的采样区域。在一种可能的实现方式中,步骤S112可以通过一个聚合层(Grouping Layer)来实现,在一个示例中,通过聚合层构建采样区域的过程可以为:以第一采样点为中心,在第一采样点的周围选择临近点来构建局部区域,将这些局部区域作为采样区域。其中,临近点可以是三维点云中与第一采样点的距离在距离阈值内的三维点,具体的距离阈值设定同样可以根据实际情况进行灵活选择,在 本公开实施例中不做限制。
在得到了多个采样区域后,可以通过步骤S113来分别得到至少一个采样区域的特征向量,步骤S113的实现方式不受限制,即对采样区域进行特征提取的方式不受限制,在一种可能的实现方式中,可以将采样区域通过点云特征提取层(Pointnet Layer)来得到特征向量,点云特征提取层的实现方式可以根据实际情况灵活决定,在一个示例中,可以将多层感知器(MLP,Multi-Layer Perceptron)来作为点云特征层的实现方式,从而提取采样区域的特征向量。
在得到了至少一个采样区域的特征向量后,可以基于这些特征向量,通过步骤S114来分别得到三维点云中三维点的特征向量。步骤S114的实现方式不受限定,在一种可能的实现方式中,可以通过上采样层(Upsampling Layer)来实现。在一个示例中,利用上采样层得到三维点云中三维点的特征向量的方式可以为:在至少一个采样区域中,根据该采样区域包含的三维点在采样区域中的空间位置,通过插值的方法实现上采样,从而得到插值计算结果,并将插值计算结果与该采样区域的特征向量进行结合,从而得到该采样区域中三维点的特征向量,由于采样区域为三维点云中划分的区域,因此在得到至少一个采样区域中包含的三维点的特征向量后,即可以得到三维点云包含的三维点的特征向量。其中,插值计算的具体实现方式不受限定,在一个示例中,可以通过双线性插值,实现插值计算。
通过构建多个以第一采样点为中心的采样区域,并分别提取这些采样区域的特征向量,然后根据提取的特征向量进一步得到三维点云中三维点的特征向量,作为特征提取结果,通过上述过程,可以将对三维点的特征提取过程转化为通过三维目标特征学习处理机制实现的特征提取过程,即可以将三维点的特征提取过程转化为通过特征提取层或是特征提取网络所实现的批量的特征提取过程,大大提升了特征提取的效率,继而提升了目标检测过程的效率。
在得到了三维点云中三维点的特征向量作为特征提取结果后,可以通过步骤S13,来将三维点云划分为多个用于实现目标检测的候选区域。具体如何实现候选区域的划分,其实现方式不做限定,在一种可能的实现方式中,步骤S12可以包括:
步骤S121,根据特征提取结果,对三维点云进行目标对象的类别预测,得到类别预测结果,其中,类别预测结果用于指示三维点云包括的三维点所属的目标对象的类别。
步骤S122,根据特征提取结果,对三维点云进行目标对象的位置预测,得到位置预测结果,其中,位置预测结果用于指示三维点云中目标对象所在的三维点的位置。
步骤S123,根据类别预测结果和位置预测结果,确定场景中包括目标对象的至少一个候选区域。
其中,类别预测可以是对目标对象所属的类别进行预测,对于目标对象来说,其可能被划分为多个类别,举例来说,在目标对象是室内物体的情况下,目标对象可以根据其类别的不同被划分为:桌子、椅子、沙发、空调或是属于室内的其他类别的物体等。在一种可能的实现方式中,类别预测也可以是对目标对象的属性进行预测,对于一种目标对象来说,其可能进一步被划分为多个属性,在一个示例中,在目标对象是行人的情况下,目标对象可以根据其状态被划分为:正在走路的行人、站立的行人或是处于其他状态的行人等类别;也可以根据其穿戴特征被划分为:戴帽子的行人、穿运动鞋的行人或是穿卫衣的行人等类别;在目标对象是人脸的情况下,也可以根据其标签将其划分为:高兴、悲伤、大笑或是哭泣等类别;在目标对象是机动车牌照的情况下,也可以被进一步划分为汽车牌照、摩托车牌照或是其他牌照等类别。具体的类别预测所包含的种类以及划分的依据,可以根据实际情况灵活决定,在本公开实施例中不做限制。
而位置预测则可以是对三维点云中,目标对象所在的位置进行预测,在一种可能的实现方式中,其可以包含两个方面,一方面可以是目标对象在三维点云中的坐标,即目标对象位于三维点云中的哪个位置,另一方面还可以包含有目标对象的大小,即目标对象在三维点云中的覆盖范围,通过上述公开实施例可以看出,在一种可能的实现方式中,对目标对象的位置预测,可以是预测三维点云中哪些三维点位于目标对象的覆盖范围内。
需要注意的是,本公开实施例中,对三维点云进行类别预测和位置预测,其实现过程没有先后顺序的限制,二者可以分别进行,也可以同时进行,分别进行时二者的先后顺序也不受限制,根据实 际情况灵活选择即可。
通过上述公开实施例可以看出,在一种可能的实现方式中,可以根据三维点云中三维点的特征向量,来分别对三维点云进行类别预测和位置预测,然后根据类别预测和位置预测的结果,来综合确定出三维点云中包括目标对象的至少一个候选区域。由于类别预测和位置预测可以通过卷积神经网络等形式来实现,因此,通过上述构成,可以将目标检测的过程转化为通过神经网络模型来实现的过程,与通过单独建模进行匹配的方式相比,可以大大提升目标检测的效率和准确性。而且,由于候选区域通过类别检测和位置检测的结果共同确定,因此,通过候选区域确定的目标对象,其特征向量可以看作是不同类别之间的类间特征向量,即本公开实施例中候选区域内目标对象的特征表示形式可以看作是通过利用类间特征向量的方式所学习的不同类别目标的特征表示,一方面可以控制神经网络学习不同类别目标的高维特征表示,能够更好地提取三维点云中的目标特征,另一方面可以使得最后得到的目标检测结果,可以包含有多类别的目标,即可以同时对场景中的多个目标对象实现批量以及多种类的目标检测,大大提升了目标检测的效率。
具体地,如何实现对三维点云的类别预测和位置预测,其实现方式可以根据实际情况灵活决定,不局限于下述公开实施例。在一种可能的实现方式中,步骤S121可以包括:
将特征提取结果通过类别预测卷积网络进行处理,得到三维点云包括的三维点的类别预测结果。
通过上述公开实施例可以看出,在一种可能的实现方式中,可以通过类别预测卷积网络,来实现对三维点云中包含的三维点属于目标对象的某个类别的预测。类别预测卷积网络的实现方式不受限制,任何可以实现预测三维点所属类别的神经网络,均可以作为类别预测卷积网络的实现形式。在一种可能的实现方式中,类别预测网络可以通过多个类别预测分支进行实现,每一个类别预测分支可以用于预测三维点云中包含的目标对象的一种类别,并输出三维点属于此类别的概率,具体的类别预测分支的数量在本公开实施例中不做限制,可以根据实际情况灵活决定。各类别预测分支的实际实现方式也不受限定,在一个示例中,可以将一维卷积作为各类别分支的实现形式,将三维点云中三维点的坐标和特征向量作为输入,分别通过至少一个由一维卷积实现的类别分支,可以得到三维点云中三维点属于至少一个类别的概率。
通过类别预测卷积网络来得到三维点云包括的三维点的类别预测结果,可以利用神经网络来实现对三维点云的类别预测,大大提升了类别预测实现的简便性和可靠性,且适合批量操作,提升了类别预测的效率,继而提升了目标检测过程的效率。
在一种可能的实现方式中,步骤S122可以包括:
步骤S1221,将特征提取结果通过位置预测卷积网络进行处理,得到三维点云包括的三维点与至少一个预设检测框之间的残差量,其中,预设检测框的数量不少于一个。
步骤S1222,根据残差量,得到三维点匹配的至少一个检测框,作为位置预测结果。
上述公开实施例中,位置预测卷积网络可以是用来预测三维点云中三维点与预设检测框之间匹配程度的神经网络,其实现方式不受限定,可以根据实际情况灵活决定。而预设检测框可以是根据需求定义的锚点框(anchor),由于对三维点云的目标检测,可以是检测三维点云中是否包含某个或某些目标,因此,可以根据包含的这些目标的实际情况,预先设置一个大小与形状与目标对象较为匹配的锚点框,来作为预设检测框,这样,将三维点云中三维点的坐标和特征向量作为输入通过位置预测卷积网络,则可以根据三维点与至少一个预设检测框之间的匹配程度,来确定这个三维点是否属于其中的某个或某类目标对象。
预设检测框的数量和实现方式不受限定,在一个示例中,目标检测可以检测三维点云中是否包含有A个不同目标对象中的一个或多个,在此情况下,可以首先预设A个不同的预设检测框,并根据这A个不同目标对象的实际情况,分别定义这A个预设检测框的大小,被定义的大小可以相同也可以不同,根据实际情况灵活确定即可。进一步地,上述公开实施例中已经提出,目标对象可以进一步被划分为多个类别,因此,可以根据目标对象的所属类别的数量B,将预设检测框设定为B个维度,在一个示例中,目标对象可能被划分为七类,因此,可以将预设检测框设定为(x,y,z,h,w,l,ry)这七个维度,其中,x、y和z可以分别代表检测框的中心点在x、y和z这三个维度上的空间坐标,h、w和l可以 分别代表检测框对应的目标对象的高度、宽度和长度,ry则可以代表检测框对应的目标对象在z轴下的旋转角度。这样,将三维点云的特征提取结果通过位置预测卷积网络后,可以分别得到三维点与至少一个预设检测框之间预测的七个维度上的残差量,根据这一残差量,可以确定三维点是否与其中的某个或某些预设检测框匹配,继而可以根据三维点与预设检测框之间的匹配关系,从三维点云中对预设检测框的大小和位置进行修正,得到与三维点匹配的至少一个检测框来作为位置预测结果,用于步骤S133中候选区域的确定过程中。
通过将特征提取结果通过位置预测卷积网络,得到三维点云包括的三维点与至少一个预设检测框之间的残差量,根据这一残差量来进一步的确定三维点云中,与三维点所匹配的检测框,来作为位置预测检测结果。通过上述过程,可以根据目标对象的类别来设定检测框的大小和维度,从而使得通过位置预测卷积网络确定的检测框,可以兼具目标对象的类别和位置,具有更准确的检测结果,从而提升确定的候选区域的准确性,继而提升目标检测的准确性。
上述公开实施例中已经提出,可以通过位置预测卷积网络来得到三维点云中包含的目标对象的位置预测结果,在一种可能的实现方式中,位置预测卷积网络可以是神经网络,其可以通过训练数据进行训练。具体的训练过程可以根据实际情况灵活决定,在一种可能的实现方式中,位置预测卷积网络可以通过训练数据训练,训练数据可以包括三维点云样本、样本对象在三维点云样本中的第一位置以及与样本对象的类别对应的至少一个第一特征向量,训练过程可以包括:
基于三维点云样本和初始位置预测卷积网络,得到第一位置预测结果。
根据第一位置预测结果与第一位置之间的误差,得到第一误差损失。
根据三维点云样本包括的三维点的特征向量,与第一特征向量之间的距离,得到第二误差损失。
根据第一误差损失和/或第二误差损失,对初始位置预测卷积网络进行训练。
其中,初始位置预测卷积网络可以是位置预测卷积网络的初始形式,而三维点云样本是可以输入到初始位置预测卷积网络中,用于对该初始位置预测卷积网络进行训练的一个或多个已知的三维点云。样本对象可以是三维点云样本中包含的对象,其实现形式可以参考上述目标对象的实现形式,在此不再赘述。第一位置则可以是该三维点云样本中,包含的样本对象在该三维点云样本中的实际位置。与样本对象的类别对应的至少一个第一特征向量,可以是定义的用于初始位置预测卷积网络学习的特征向量,定义的方式可以根据实际情况灵活决定,在一种可能的实现方式中,第一特征向量可以与样本对象的类别一一对应,即可以根据样本对象所属的类别种类,分别为至少一个类别的目标对象均定义一个用于学习训练的特征向量。
在一种可能的实现方式中,在对初始位置预测卷积网络进行训练时,可以根据将三维点云样本通过初始预测卷积网络得到的结果,来确定初始位置预测卷积网络的误差损失,从而调整初始位置预测卷积网络的参数,来得到更为准确的位置预测卷积网络。通过上述公开实施例可以看出,在一种可能的实现方式中,误差损失可以包括有第一误差损失和第二误差损失,其中,第一误差损失可以是将三维点云样本通过初始位置预测卷积网络得到的位置预测结果,与样本对象在三维点云中实际的第一位置之间的偏差,来得到的误差损失;第二误差损失可以是三维点云样本中训练三维点的特征向量,与样本对象的类别对应的第一特征向量之间的距离而共同构成的误差损失。在一种可能的实现方式中,可以同时将第一误差损失和第二误差损失作为误差损失来对初始位置预测卷积网络进行训练,在一种可能的实现方式中,也可以只考虑其中的某项误差损失来进行训练,根据实际情况进行灵活选择即可。
通过上述训练过程,可以充分的利用训练数据中不同样本对象的类别之间的类间特征向量,使得训练好的位置预测卷积网络可以学习不同类别目标的特征表示,从而使得该位置预测卷积网络可以更好的提取三维点云中的目标特征,得到更为准确的位置预测结果,从而提升后续目标检测的准确度。而且这种训练方式可以通过端到端的形式进行实现,从而使得位置预测的结果更加准确,可以更好的对各种影响因素进行优化。
通过上述各公开实施例,可以得到三维点云的类别预测结果和位置预测结果,进一步地,可以通过步骤S123,来基于类别预测结果和位置预测结果,确定三维点云中至少一个候选区域,即确定三 维点云中,至少一个包含有目标对象的候选区域。
步骤S123的实现方式不受限定,在一种可能的实现方式中,步骤S123可以包括:
步骤S1231,获取位置预测结果包括的至少一个检测框。
步骤S1232,根据检测框包括的三维点的类别预测结果,得到至少一个检测框的预测分数。
步骤S1233,将预测分数大于分数阈值的检测框,作为目标对象的至少一个初始候选区域。
上述公开实施例中已经提出,在进行类别预测后,可以得到与三维点匹配的至少一个检测框,来作为位置预测结果,因此,在步骤S123中,可以进一步地根据这些检测框,来确定候选区域。
通过步骤S1232至步骤S1233可以看出,在一种可能的实现方式中,由于检测框可以大致表明目标对象在三维点云中的位置,因此可以进一步根据检测框确定该检测框包含的是何种目标对象。在一种可能的实现方式中,可以首先根据检测框包括的三维点的类别预测结果,来得到至少一个检测框的预测分数,即根据检测框中三维点在至少一个类别下的概率,来分别计算检测框在至少一个类别下的分数,具体的分数计算规则可以根据实际情况进行灵活设定,在本公开实施例中不做限制。在分别得到了检测框在至少一个类别下的预测分数后,可以将其与至少一个类别下的分数阈值进行比较,从而判断该检测框是否包含该类别下的目标对象,分数阈值也可以根据实际情况进行设定,不同类别的分数阈值可以相同也可以不同,在此不做限定。当预测分数大于某类别的分数阈值的情况下,可以认为该检测框包含该类别的目标对象,否则则认为该检测框包含的目标对象不属于当前预测的类别,通过将预测分数与分数阈值进行比较,可以从三维点云中确定至少一个检测框,来作为候选区域。
在一些可能的实施方式中,由于选出的检测框可能存在重复或是重合度较高等情况,因此,还可以通过步骤S1234,来删除确定的候选区域中重复的检测框,其中,重复的检测框可以是完全重合的检测框,也可以是重合度高于设定的重合度阈值的检测框,具体重合度阈值的数值,可以根据实际情况灵活设定,在本公开实施例中不做限制。如何检测并删除重复的检测框,其实现方式不受限定,在一种可能的实现方式中,可以通过非极大值抑制(NMS,Non maximum suppression)方法,来去掉其中重复的检测框,从而得到最终的检测框,作为目标对象的候选区域。
通过上述过程,可以充分将类别预测和位置预测的结果结合在一起,从而使得确定的候选区域,既能表达出目标对象的位置,也可以表明目标对象的身份,基于此候选区域进行进一步的目标检测的结果,可以具有更高的准确性。
进一步地,在确定了三维点云中的候选区域后,还可以筛选出位于该候选区域中的三维点的点云集合,并得到在候选区域中的候选点云集合的空间坐标与特征向量,来为进入到步骤S13作准备。具体的确定和获取方式不受限定,因此,在一种可能的实现方式中,步骤S13之前还可以包括:
确定至少一个候选区域包括的三维点构成的三维子点云。
获取三维子点云包括的三维点的坐标,作为三维子点云的空间坐标。
获取三维子点云包括的三维点的特征向量,作为三维子点云的特征向量。
根据三维子点云的空间坐标和三维子点云的特征向量,得到三维子点云的特征矩阵。
由于候选区域是从三维点云中选定的区域,因此候选区域位于三维点云内,因此,候选区域所包括的三维点所构成的点云集合,可以作为上述公开实施例中的三维子点云。进一步地,由于三维点云中的三维点的坐标与特征向量均已知,因此三维子点云中三维点的坐标和特征向量都已知,因此可以便于确定三维子点云的空间坐标和特征向量,并将这些空间坐标和特征向量以矩阵的形式进行表达,来组成三维子点云的特征矩阵。
通过上述过程,可以在确定了候选区域的情况下,进一步确定候选区域的特征矩阵,为后续根据候选区域进行目标检测作出了充足的准备,保证目标检测过程的顺利实现。
在确定了候选区域后,可以通过步骤S13,根据确定的候选区域来对目标对象进行检测。具体的检测过程可以根据实际情况灵活决定,在一种可能的实现方式中,步骤S13可以包括:
步骤S131,对第一候选区域包括的三维子点云进行采样,得到第一候选区域包括的第二采样点,其中,第一候选区域为至少一个候选区域中的任一个候选区域。
步骤S132,根据第一候选区域包括的三维子点云的特征矩阵,获取第一候选区域包括的第二采 样点的注意力特征向量。
步骤S133,通过融合卷积网络,将第一候选区域包括的第二采样点的注意力特征向量进行融合,得到第一候选区域的特征融合结果。
步骤S134,将第一候选区域的特征融合结果作为第一候选区域的检测结果。
其中,三维子点云为上述公开实施例提到的,由候选区域包括的三维点所共同构成的子点云,在此不再赘述。第二采样点可以是对至少一个候选区域进行采样所得到的采样点,需要注意的是,本公开实施例中,第一采样点与第二采样点中的“第一”与“第二”仅用于区分该采样点的采样对象不同,即第一采样点是对三维点云进行采样得到的采样点,第二采样点是对三维子点云进行采样得到的采样点,而非限制二者的采样方式,即第一采样点与第二采样点的采样方式可以相同,也可以不同。
第一候选区域可以是上述公开实施例中得到的候选区域包含的某个或某些候选区域,在一种可能的实现方式中,可以分别将至少一个得到的候选区域作为第一候选区域,从而分别得到至少一个候选区域对应的检测结果。
通过上述公开实施例可以看出,在一种可能的实现方式中,在对候选区域进行目标检测的过程中,可以进一步的对候选区域进行采样,得到至少一个第二采样点,并基于此第二采样点的注意力特征向量,来得到候选区域的注意力特征向量的特征融合结果,作为候选区域中对目标检测的检测结果。通过上述过程,可以利用注意力机制对候选区域内的点云特征进行处理,从而抑制目标外的干扰点特征对检测结果的影响,从而提升目标检测的准确度。
在一种可能的实现方式中,对第一候选区域包括的三维子点云进行采样得到第二采样点的过程,可以与对三维点云进行采样得到第一采样点的过程相同,在此不再赘述。
在得到了第二采样点后,可以获取第二采样点的注意力特征向量。具体的获取方式不受限制,在一种可能的实现方式中,步骤S132可以包括:
步骤S1321,根据第一候选区域包括的三维子点云的特征矩阵,对第二采样点进行特征提取,得到第二采样点的初始特征向量。
步骤S1322,将第二采样点的初始特征向量进行平均池化,得到第一候选区域的全局特征向量。
步骤S1323,将第二采样点的初始特征向量与全局特征向量进行拼接,得到第二采样点的扩展特征向量。
步骤S1324,根据第二采样点的扩展特征向量,得到第二采样点的注意力系数。
步骤S1325,将第二采样点的注意力系数与第二采样点的初始特征向量进行相乘,得到第二采样点的注意力特征向量。
通过上述公开实施例可以看出,在一种可能的实现方式中,获取第二采样点的注意力特征向量的过程可以为:首先对第二采样点进行特征提取,得到其初始特征向量,特征提取的过程可以参见各上述公开实施例,在此不再赘述,由于上述公开实施例中提到过,在确定候选区域的同时可以得到候选区域包括的三维子点云的特征矩阵,因此,在一种可能的实现方式中,也可以从特征矩阵中提取第二采样点对应的特征向量,来作为第二采样点的初始特征向量。然后将第二采样点通过平均池化层,来得到候选区域的全局特征向量,接着将得到的全局特征向量与第二采样点本身的初始特征向量进行拼接,得到第二采样点的扩展特征向量。在得到了第二采样点的扩展特征向量后,可以通过步骤S1324,来根据这一扩展特征向量得到第二采样点的注意力特征,具体如何得到,其方式可以根据实际情况灵活决定。在一种可能的实现方式中,可以将第二采样点的扩展特征向量通过MLP,来得到第二采样点的注意力系数,这样,将第二采样点的注意力系数与该第二采样点本身的初始特征向量相乘,得到的特征向量可以看作为第二采样点的注意力特征向量。
通过上述过程,可以较为便捷的得到第二采样点的注意力特征向量,继而基于此注意力特征向量得到目标对象的检测结果,提升了整个目标检测过程的便捷性和准确性。
在得到了第一候选区域包括的第二采样点的注意力特征向量后,可以通过融合卷积网络分别对第一候选区域中包括的第二采样点的注意力特征向量进行融合,然后将特征融合结果作为该候选区域的目标检测结果,这样,统计所有的第一候选区域的目标检测结果,则可以得到整个三维点云对应的 目标检测结果。其中,融合卷积网络的实现方式不受限制,任何可以基于注意力特征向量来得到检测结果的神经网络,均可以作为融合卷积网络的实现形式,在一个示例中,可以通过预测层实现上述融合过程,完成对目标对象的检测。
图2示出根据本公开实施例的目标对象的检测装置的框图。如图2所示,目标对象的检测装置20包括:
特征提取模块21,用于对目标场景的三维点云进行特征提取,得到特征提取结果。
候选区域确定模块22,用于根据特征提取结果,对三维点云进行目标对象的类别预测以及位置预测,确定目标场景中的目标对象的至少一个候选区域。
检测模块23,用于在至少一个候选区域中,对目标对象进行检测,得到检测结果。
在一种可能的实现方式中,特征提取模块用于:对三维点云进行采样,得到至少一个第一采样点;在三维点云中构建至少一个以第一采样点为中心的采样区域;对采样区域进行特征提取,得到采样区域的特征向量;根据采样区域的特征向量,分别确定三维点云包括的三维点的特征向量,作为特征提取结果。
在一种可能的实现方式中,候选区域确定模块用于:根据特征提取结果,对三维点云进行目标对象的类别预测,得到类别预测结果,其中,类别预测结果用于指示三维点云包括的三维点所属的目标对象的类别;根据特征提取结果,对三维点云进行目标对象的位置预测,得到位置预测结果,其中,位置预测结果用于指示三维点云中目标对象所在的三维点的位置;根据类别预测结果和所述位置预测结果,确定场景中包括目标对象的至少一个候选区域。
在一种可能的实现方式中,候选区域确定模块进一步用于:将特征提取结果通过类别预测卷积网络进行处理,得到三维点云包括的三维点所属的目标对象的类别。
在一种可能的实现方式中,候选区域确定模块进一步用于:将特征提取结果通过位置预测卷积网络进行处理,得到三维点云包括的三维点与至少一个预设检测框之间的残差量,其中,预设检测框的数量不少于一个;根据残差量,得到三维点匹配的至少一个检测框,作为位置预测结果。
在一种可能的实现方式中,位置预测卷积网络通过训练数据训练,训练数据包括三维点云样本、样本对象在三维点云样本中的第一位置以及与样本对象的类别对应的第一特征向量,训练包括:基于三维点云样本和初始位置预测卷积网络,得到第一位置预测结果;根据第一位置预测结果与第一位置之间的误差,得到第一误差损失;根据三维点云样本包括的三维点的特征向量,与第一特征向量之间的距离,得到第二误差损失;根据第一误差损失和/或第二误差损失,对初始位置预测卷积网络进行训练。
在一种可能的实现方式中,候选区域确定模块进一步用于:获取位置预测结果包括的至少一个检测框;根据检测框包括的三维点的类别预测结果,分别得到至少一个检测框的预测分数;将预测分数大于分数阈值的检测框,作为目标对象的候选区域。
在一种可能的实现方式中,在检测模块之前,候选区域确定模块还用于:确定至少一个候选区域包括的三维点构成的三维子点云;获取三维子点云包括的三维点的坐标,作为三维子点云的空间坐标;获取三维子点云包括的三维点的特征向量,作为三维子点云的特征向量;根据三维子点云的空间坐标和三维子点云的特征向量,得到三维子点云的特征矩阵。
在一种可能的实现方式中,检测模块用于:对第一候选区域包括的三维子点云进行采样,得到第一候选区域包括的第二采样点,其中,第一候选区域为至少一个候选区域中的任一个候选区域;根据第一候选区域包括的三维子点云的特征矩阵,获取第一候选区域包括的第二采样点的注意力特征向量;通过融合卷积网络,将第一候选区域包括的第二采样点的注意力特征向量进行融合,得到第一候选区域的特征融合结果;将第一候选区域的特征融合结果作为第一候选区域的检测结果。
在一种可能的实现方式中,检测模块进一步用于:根据第一候选区域包括的三维子点云的特征矩阵,对第二采样点进行特征提取,得到第二采样点的初始特征向量;将第二采样点的初始特征向量进行平均池化,得到第一候选区域的全局特征向量;将第二采样点的初始特征向量与全局特征向量进行拼接,得到第二采样点的扩展特征向量;根据第二采样点的扩展特征向量,得到第二采样点的注意 力系数;将第二采样点的注意力系数与第二采样点的初始特征向量进行相乘,得到第二采样点的注意力特征向量。
在不违背逻辑的情况下,本申请不同实施例之间可以相互结合,不同实施例描述有所侧重,未侧重描述的部分可参见其他实施例的记载。
在本公开的一些实施例中,本公开实施例提供的装置具有的功能或包含的模块可以用于执行上文方法实施例描述的方法,其具体实现和技术效果可参照上文方法实施例的描述,为了简洁,这里不再赘述。
应用场景示例
随着目标识别任务愈加复杂,如何有效提升室内空间包含的各种物品的识别效果,成为一个亟待解决的问题。
图3~图5示出了根据本公开一应用示例的示意图,如图所示,本公开实施例提出了一种目标对象的检测方法,其具体过程可以为:
图3示出了对目标对象进行检测的完整过程,从图3可以看出,在本公开应用示例中,可以首先通过对包含有多种目标对象的室内空间的三维点云进行特征提取(即图3中的基于类间特征向量的三维点云特征提取过程),来得到三维点云中每个三维点的特征向量来作为特征提取结果,在得到特征提取结果后,可以基于特征提取结果,一方面进行目标对象的位置预测(即图3中的位置预测),一方面进行目标对象的类别预测(即图3中的类别预测),来确定目标场景中目标对象的至少一个候选区域,并同时得到候选区域的特征向量(即图3中的联合预测特征),在确定了候选区域后,可以基于注意力机制对候选区域中的目标对象进行检测,从而得到目标对象的检测结果,在本公开应用示例中,目标对象的检测结果可以包含有三维点云中目标对象所在的位置以及目标对象的具体类别。
上述公开应用示例中提到的特征提取的过程可以参见图4,从图4中可以看出,在本公开应用示例中,对三维点云进行特征提取得到特征向量的过程可以通过特征提取的神经网络来实现,该特征提取的神经网络可以分为四层,分别为采样层、聚合层、点云特征提取层和上采样层,其中,采样层可以在输入的三维点云中使用FPS算法选择一系列第一采样点,由此定义出采样区域的中心,FPS算法的基本过程是先随机选择一个点,然后再选择离这个点最远的点作为起点,再继续迭代,直到选出需要的个数为止。聚合层可以以第一采样点为中心,利用临近点构建局部区域,进而提取特征。点云特征提前层则可以利用MLP对采样区域进行特征提取,而上采样层则可以使用插值的方法用第一采样点来得到三维点云中每个三维点的特征向量。
如图4所示,在一个示例中,对于包含有N个三维点的三维点云来说,其每个三维点的空间坐标构成的空间坐标矩阵可以通过d来表示,其包含的某些三维点的特征向量所构成的特征矩阵可以通过C来表示,为了通过特征提取来得到该三维点云中每个三维点的特征向量所构成的特征矩阵C4,如图所示,可以首先通过对三维点云包含的三维点进行采样与聚合,一方面经过采样后,三维点云中包含的三维点的数量可以从N变为N1,另一方面经过聚合后,可以得到多个采样区域,每个采样区域中包含的三维点的数量可以记为K,此时,可以分别对每个采样区域进行特征提取,来得到每个采样区域的特征向量,从而构成三维点云的特征矩阵C1,在得到了三维点云的特征矩阵C1后,可以通过插值来得到每个采样区域中每个三维点的特征向量,继而得到三维点云中每个三维点的特征向量,在本公开应用示例中,由于经过一次采样和聚合后可能采样区域的数量还是过多,因此可以进一步地,再进行一次采样和聚合,进一步得到二次筛选的采样区域,从而基于此二次筛选的采样区域进行特征提取,得到特征矩阵C2,然后基于此特征矩阵C2进行插值,并将插值后得到的结果与C1进行合并作为C3,之后再次插值,并将插值结果与初始的特征矩阵C进行合并,从而得到三维点云中每个三维点的特征向量所构成的特征矩阵C4。
在得到了三维点云中每个三维点的特征向量后,可以进一步根据这些特征向量确定三维点云中的候选区域,图5示出一种确定候选区域的方式,从图5和图3均可以看出,在确定候选区域的过程中,可以根据三维点云中每个三维点的坐标和特征向量,来分别对三维点云进行位置预测和类别预测,并将位置预测和类别预测的结果进行结合,从而有效确定三维点云中的候选区域。
在本公开应用示例中,可以通过神经网络实现类别预测和位置预测,在一个示例中,类别预测和位置预测分支均可以由一维卷积实现。对于类别预测分支,卷积网络随后输出通道数量为类别数目;对于位置的预测,本公开应用示例采用anchor的方法进行预测,在一个示例中,可以预先定义A个anchor大小,然后对于每个anchor预测(x,y,z,h,w,l,ry)7个维度(即目标对象可能对应的7个类别)的残差量,从而得到初步预测框。进一步地,对于得到的初步预测框,可以根据其中包含的每个三维点在类别分支的类别预测结果得到的分数(score),选出分数大于分数阈值的检测框,然后进行NMS后处理,得到最终的候选区域。对于每个候选区域,可以进一步筛选出在该空间区域内的三维点云子集合,作为三维子点云,该三维子点云的空间坐标和特征向量组成该候选区域的特征矩阵。
其中,在对位置的预测神经网络进行训练过程中,可以为每个目标对象的类别分别定义一个可学习的特征向量,并计算训练数据中每个三维点的特征向量与对应的目标对象的类别的可学习特征向量的距离,将计算的距离作为惩罚项(即误差损失)加入到网络训练的过程中,即在位置的预测神经网络的训练过程中,计算三维点在每个目标对象的类别下的特征向量距离,从而实现在每个目标对象的类别下对位置预测神经网络的训练。
在确定了候选区域后,可以基于上述公开应用示例中得到的每个候选区域的特征矩阵,来对每个候选区域中的目标对象进行检测,在一个示例中,可以对于候选区域内的三维子点云,采用上述公开应用示例中一样的采样方式,进一步提取候选区域内的第二采样点,并得到其特征向量。然后,使用所有第二采样点的特征向量,通过平均池化层得到候选区域的全局特征向量,并将全局特征向量与第二采样点本身的特征向量拼接,实现对第二采样点特征向量的扩展。每个第二采样点再使用扩展后的特征向量经过MLP得到相应的注意力系数,并将注意力系数与本身的特征向量相乘,从而得到每个第二采样点的注意力特征向量。最后,可以对得到的所有第二采样点的注意力特征向量进一步使用卷积网络进行融合,预测每个候选区域对应的目标对象的类别和位置结果,作为整个三维点云的目标检测结果,即预测出室内空间中包含的每个物品(即目标对象)的类别和位置,来作为检测结果。
本公开应用示例中提出的目标对象的检测方法,除了可以应用于室内物品识别任务中以外,也可以应用到其他有目标对象的检测需求的任务之中。
可以理解,本公开提及的上述各个方法实施例,在不违背原理逻辑的情况下,均可以彼此相互结合形成结合后的实施例,限于篇幅,本公开不再赘述。
本领域技术人员可以理解,在具体实施方式的上述方法中,各步骤的撰写顺序并不意味着严格的执行顺序而对实施过程构成任何限定,各步骤的具体执行顺序应当以其功能和可能的内在逻辑确定。
本公开实施例还提出一种计算机可读存储介质,其上存储有计算机程序指令,所述计算机程序指令被处理器执行时实现上述方法。计算机可读存储介质可以是易失性计算机可读存储介质或非易失性计算机可读存储介质。
本公开实施例还提出一种电子设备,包括:处理器;用于存储处理器可执行指令的存储器;其中,所述处理器被配置为上述方法。
本公开实施例还提出一种计算机程序,包括计算机可读代码,当所述计算机可读代码在电子设备中运行时,所述电子设备中的处理器执行用于实现上述方法。
在实际应用中,上述存储器可以是易失性存储器(volatile memory),例如RAM;或者非易失性存储器(non-volatile memory),例如ROM,快闪存储器(flash memory),硬盘(Hard Disk Drive,HDD)或固态硬盘(Solid-State Drive,SSD);或者上述种类的存储器的组合,并向处理器提供指令和数据。
上述处理器可以为ASIC、DSP、DSPD、PLD、FPGA、CPU、控制器、微控制器、微处理器中的至少一种。可以理解地,对于不同的设备,用于实现上述处理器功能的电子器件还可以为其它,本公开实施例不作具体限定。
电子设备可以被提供为终端、服务器或其它形态的设备。
基于前述实施例相同的技术构思,本公开实施例还提供了一种计算机程序,该计算机程序被处理器执行时实现上述方法。
图6是根据本公开实施例的一种电子设备800的框图。例如,电子设备800可以是移动电话,计算机,数字广播终端,消息收发设备,游戏控制台,平板设备,医疗设备,健身设备,个人数字助理等终端。
参照图6,电子设备800可以包括以下一个或多个组件:处理组件802,存储器804,电源组件806,多媒体组件808,音频组件810,输入/输出(I/O)的接口812,传感器组件814,以及通信组件816。
处理组件802通常控制电子设备800的整体操作,诸如与显示,电话呼叫,数据通信,相机操作和记录操作相关联的操作。处理组件802可以包括一个或多个处理器820来执行指令,以完成上述的方法的全部或部分步骤。此外,处理组件802可以包括一个或多个模块,便于处理组件802和其他组件之间的交互。例如,处理组件802可以包括多媒体模块,以方便多媒体组件808和处理组件802之间的交互。
存储器804被配置为存储各种类型的数据以支持在电子设备800的操作。这些数据的示例包括用于在电子设备800上操作的任何应用程序或方法的指令,联系人数据,电话簿数据,消息,图片,视频等。存储器804可以由任何类型的易失性或非易失性存储设备或者它们的组合实现,如静态随机存取存储器(SRAM),电可擦除可编程只读存储器(EEPROM),可擦除可编程只读存储器(EPROM),可编程只读存储器(PROM),只读存储器(ROM),磁存储器,快闪存储器,磁盘或光盘。
电源组件806为电子设备800的各种组件提供电力。电源组件806可以包括电源管理系统,一个或多个电源,及其他与为电子设备800生成、管理和分配电力相关联的组件。
多媒体组件808包括在所述电子设备800和用户之间的提供一个输出接口的屏幕。在一些实施例中,屏幕可以包括液晶显示器(LCD)和触摸面板(TP)。如果屏幕包括触摸面板,屏幕可以被实现为触摸屏,以接收来自用户的输入信号。触摸面板包括一个或多个触摸传感器以感测触摸、滑动和触摸面板上的手势。所述触摸传感器可以不仅感测触摸或滑动动作的边界,而且还检测与所述触摸或滑动操作相关的持续时间和压力。在一些实施例中,多媒体组件808包括一个前置摄像头和/或后置摄像头。当电子设备800处于操作模式,如拍摄模式或视频模式时,前置摄像头和/或后置摄像头可以接收外部的多媒体数据。每个前置摄像头和后置摄像头可以是一个固定的光学透镜系统或具有焦距和光学变焦能力。
音频组件810被配置为输出和/或输入音频信号。例如,音频组件810包括一个麦克风(MIC),当电子设备800处于操作模式,如呼叫模式、记录模式和语音识别模式时,麦克风被配置为接收外部音频信号。所接收的音频信号可以被进一步存储在存储器804或经由通信组件816发送。在一些实施例中,音频组件810还包括一个扬声器,用于输出音频信号。
I/O接口812为处理组件802和外围接口模块之间提供接口,上述外围接口模块可以是键盘,点击轮,按钮等。这些按钮可包括但不限于:主页按钮、音量按钮、启动按钮和锁定按钮。
传感器组件814包括一个或多个传感器,用于为电子设备800提供各个方面的状态评估。例如,传感器组件814可以检测到电子设备800的打开/关闭状态,组件的相对定位,例如所述组件为电子设备800的显示器和小键盘,传感器组件814还可以检测电子设备800或电子设备800一个组件的位置改变,用户与电子设备800接触的存在或不存在,电子设备800方位或加速/减速和电子设备800的温度变化。传感器组件814可以包括接近传感器,被配置用来在没有任何的物理接触时检测附近物体的存在。传感器组件814还可以包括光传感器,如CMOS或CCD图像传感器,用于在成像应用中使用。在一些实施例中,该传感器组件814还可以包括加速度传感器,陀螺仪传感器,磁传感器,压力传感器或温度传感器。
通信组件816被配置为便于电子设备800和其他设备之间有线或无线方式的通信。电子设备800可以接入基于通信标准的无线网络,如WiFi,2G或3G,或它们的组合。在一个示例性实施例中,通信组件816经由广播信道接收来自外部广播管理系统的广播信号或广播相关人员信息。在一个示例性实施例中,所述通信组件816还包括近场通信(NFC)模块,以促进短程通信。例如,在NFC模块可基于射频识别(RFID)技术,红外数据协会(IrDA)技术,超宽带(UWB)技术,蓝牙(BT)技术 和其他技术来实现。
在示例性实施例中,电子设备800可以被一个或多个应用专用集成电路(ASIC)、数字信号处理器(DSP)、数字信号处理设备(DSPD)、可编程逻辑器件(PLD)、现场可编程门阵列(FPGA)、控制器、微控制器、微处理器或其他电子元件实现,用于执行上述方法。
在示例性实施例中,还提供了一种非易失性计算机可读存储介质,例如包括计算机程序指令的存储器804,上述计算机程序指令可由电子设备800的处理器820执行以完成上述方法。
图7是根据本公开实施例的一种电子设备1900的框图。例如,电子设备1900可以被提供为一服务器。参照图7,电子设备1900包括处理组件1922,其进一步包括一个或多个处理器,以及由存储器1932所代表的存储器资源,用于存储可由处理组件1922的执行的指令,例如应用程序。存储器1932中存储的应用程序可以包括一个或一个以上的每一个对应于一组指令的模块。此外,处理组件1922被配置为执行指令,以执行上述方法。
电子设备1900还可以包括一个电源组件1926被配置为执行电子设备1900的电源管理,一个有线或无线网络接口1950被配置为将电子设备1900连接到网络,和一个输入输出(I/O)接口1958。电子设备1900可以操作基于存储在存储器1932的操作系统,例如Windows ServerTM,Mac OS XTM,UnixTM,LinuxTM,FreeBSDTM或类似。
在示例性实施例中,还提供了一种非易失性计算机可读存储介质,例如包括计算机程序指令的存储器1932,上述计算机程序指令可由电子设备1900的处理组件1922执行以完成上述方法。
本公开可以是系统、方法和/或计算机程序产品。计算机程序产品可以包括计算机可读存储介质,其上载有用于使处理器实现本公开的各个方面的计算机可读程序指令。
计算机可读存储介质可以是可以保持和存储由指令执行设备使用的指令的有形设备。计算机可读存储介质例如可以是――但不限于――电存储设备、磁存储设备、光存储设备、电磁存储设备、半导体存储设备或者上述的任意合适的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括:便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、静态随机存取存储器(SRAM)、便携式压缩盘只读存储器(CD-ROM)、数字多功能盘(DVD)、记忆棒、软盘、机械编码设备、例如其上存储有指令的打孔卡或凹槽内凸起结构、以及上述的任意合适的组合。这里所使用的计算机可读存储介质不被解释为瞬时信号本身,诸如无线电波或者其他自由传播的电磁波、通过波导或其他传输媒介传播的电磁波(例如,通过光纤电缆的光脉冲)、或者通过电线传输的电信号。
这里所描述的计算机可读程序指令可以从计算机可读存储介质下载到各个计算/处理设备,或者通过网络、例如因特网、局域网、广域网和/或无线网下载到外部计算机或外部存储设备。网络可以包括铜传输电缆、光纤传输、无线传输、路由器、防火墙、交换机、网关计算机和/或边缘服务器。每个计算/处理设备中的网络适配卡或者网络接口从网络接收计算机可读程序指令,并转发该计算机可读程序指令,以供存储在各个计算/处理设备中的计算机可读存储介质中。
用于执行本公开操作的计算机程序指令可以是汇编指令、指令集架构(ISA)指令、机器指令、机器相关指令、微代码、固件指令、状态设置数据、或者以一种或多种编程语言的任意组合编写的源代码或目标代码,所述编程语言包括面向对象的编程语言—诸如Smalltalk、C++等,以及常规的过程式编程语言—诸如“C”语言或类似的编程语言。计算机可读程序指令可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络—包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。在一些实施例中,通过利用计算机可读程序指令的状态人员信息来个性化定制电子电路,例如可编程逻辑电路、现场可编程门阵列(FPGA)或可编程逻辑阵列(PLA),该电子电路可以执行计算机可读程序指令,从而实现本公开的各个方面。
这里参照根据本公开实施例的方法、装置(系统)和计算机程序产品的流程图和/或框图描述了本公开的各个方面。应当理解,流程图和/或框图的每个方框以及流程图和/或框图中各方框的组合, 都可以由计算机可读程序指令实现。
这些计算机可读程序指令可以提供给通用计算机、专用计算机或其它可编程数据处理装置的处理器,从而生产出一种机器,使得这些指令在通过计算机或其它可编程数据处理装置的处理器执行时,产生了实现流程图和/或框图中的一个或多个方框中规定的功能/动作的装置。也可以把这些计算机可读程序指令存储在计算机可读存储介质中,这些指令使得计算机、可编程数据处理装置和/或其他设备以特定方式工作,从而,存储有指令的计算机可读介质则包括一个制造品,其包括实现流程图和/或框图中的一个或多个方框中规定的功能/动作的各个方面的指令。
也可以把计算机可读程序指令加载到计算机、其它可编程数据处理装置、或其它设备上,使得在计算机、其它可编程数据处理装置或其它设备上执行一系列操作步骤,以产生计算机实现的过程,从而使得在计算机、其它可编程数据处理装置、或其它设备上执行的指令实现流程图和/或框图中的一个或多个方框中规定的功能/动作。
附图中的流程图和框图显示了根据本公开的多个实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段或指令的一部分,所述模块、程序段或指令的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个连续的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或动作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
以上已经描述了本公开的各实施例,上述说明是示例性的,并非穷尽性的,并且也不限于所披露的各实施例。在不偏离所说明的各实施例的范围和精神的情况下,对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。本文中所用术语的选择,旨在最好地解释各实施例的原理、实际应用或对市场中的技术改进,或者使本技术领域的其它普通技术人员能理解本文披露的各实施例。
Claims (21)
- 一种目标对象的检测方法,其特征在于,包括:对目标场景的三维点云进行特征提取,得到特征提取结果;根据所述特征提取结果,对所述三维点云进行目标对象的类别预测以及位置预测,确定所述目标场景中的目标对象的至少一个候选区域;在所述至少一个候选区域中,对所述目标对象进行检测,得到检测结果。
- 根据权利要求1所述的方法,其特征在于,所述对目标场景的三维点云进行特征提取,得到特征提取结果,包括:对所述三维点云进行采样,得到第一采样点;在所述三维点云中构建以所述第一采样点为中心的采样区域;对所述采样区域进行特征提取,得到所述采样区域的特征向量;根据所述采样区域的特征向量,确定所述三维点云包括的三维点的特征向量,作为所述特征提取结果。
- 根据权利要求1或2所述的方法,其特征在于,所述根据所述特征提取结果,对所述三维点云进行目标对象的类别预测以及位置预测,确定所述目标场景中的目标对象的至少一个候选区域,包括:根据所述特征提取结果,对所述三维点云进行目标对象的类别预测,得到类别预测结果,其中,所述类别预测结果用于指示所述三维点云包括的三维点所属的目标对象的类别;根据所述特征提取结果,对所述三维点云进行目标对象的位置预测,得到位置预测结果,其中,所述位置预测结果用于指示所述三维点云中目标对象所在的三维点的位置;根据所述类别预测结果和所述位置预测结果,确定所述场景中包括所述目标对象的至少一个候选区域。
- 根据权利要求3所述的方法,其特征在于,所述根据所述特征提取结果,对所述三维点云进行类别预测,得到类别预测结果,包括:将所述特征提取结果通过类别预测卷积网络进行处理,得到所述三维点云包括的三维点所属的目标对象的类别。
- 根据权利要求3或4所述的方法,其特征在于,所述根据所述特征提取结果,对所述三维点云进行位置预测,得到位置预测结果,包括:将所述特征提取结果通过位置预测卷积网络进行处理,得到所述三维点云包括的三维点与至少一个预设检测框之间的残差量,其中,所述预设检测框的数量不少于一个;根据所述残差量,得到所述三维点匹配的至少一个检测框,作为所述位置预测结果。
- 根据权利要求5中所述的方法,其特征在于,所述位置预测卷积网络通过训练数据训练,所述训练数据包括三维点云样本、样本对象在所述三维点云样本中的第一位置以及与所述样本对象的类别对应的第一特征向量,所述训练包括:基于所述三维点云样本和初始位置预测卷积网络,得到第一位置预测结果;根据所述第一位置预测结果与所述第一位置之间的误差,得到第一误差损失;根据所述三维点云样本包括的三维点的特征向量,与所述第一特征向量之间的距离,得到第二误差损失;根据所述第一误差损失和/或第二误差损失,对所述初始位置预测卷积网络进行训练。
- 根据权利要求3至6中任意一项所述的方法,其特征在于,所述根据所述类别预测结果和所述位置预测结果,确定所述场景中包括所述目标对象的至少一个候选区域,包括:获取所述位置预测结果包括的至少一个检测框;根据所述检测框包括的三维点的类别预测结果,分别得到所述至少一个检测框的预测分数;将所述预测分数大于分数阈值的检测框,作为所述目标对象的候选区域。
- 根据权利要求3至7中任意一项所述的方法,其特征在于,在所述至少一个候选区域中,对所述目标对象进行检测,得到检测结果之前,还包括:确定所述至少一个候选区域包括的三维点构成的三维子点云;获取所述三维子点云包括的三维点的坐标,作为所述三维子点云的空间坐标;获取所述三维子点云包括的三维点的特征向量,作为所述三维子点云的特征向量;根据所述三维子点云的空间坐标和所述三维子点云的特征向量,得到所述三维子点云的特征矩阵。
- 根据权利要求1至8中任意一项所述的方法,其特征在于,在所述至少一个候选区域中,对所述目标对象进行检测,得到检测结果,包括:对第一候选区域包括的三维子点云进行采样,得到所述第一候选区域包括的第二采样点,其中,所述第一候选区域为所述至少一个候选区域中的任一个候选区域;根据所述第一候选区域包括的三维子点云的特征矩阵,获取所述第一候选区域包括的第二采样点的注意力特征向量;通过融合卷积网络,将所述第一候选区域包括的第二采样点的注意力特征向量进行融合,得到所述第一候选区域的特征融合结果;将所述第一候选区域的特征融合结果作为所述第一候选区域的检测结果。
- 根据权利要求9所述的方法,其特征在于,根据所述第一候选区域包括的三维子点云的特征矩阵,获取所述第一候选区域包括的第二采样点的注意力特征向量,包括:根据所述第一候选区域包括的三维子点云的特征矩阵,对所述第二采样点进行特征提取,得到所述第二采样点的初始特征向量;将所述第二采样点的初始特征向量进行平均池化,得到所述第一候选区域的全局特征向量;将所述第二采样点的初始特征向量与所述全局特征向量进行拼接,得到所述第二采样点的扩展特征向量;根据所述第二采样点的扩展特征向量,得到所述第二采样点的注意力系数;将所述第二采样点的注意力系数与所述第二采样点的初始特征向量进行相乘,得到所述第二采样点的注意力特征向量。
- 一种目标对象的检测装置,其特征在于,包括:特征提取模块,用于对目标场景的三维点云进行特征提取,得到特征提取结果;候选区域确定模块,用于根据所述特征提取结果,对所述三维点云进行目标对象的类别预测以及位置预测,确定所述目标场景中的目标对象的至少一个候选区域;检测模块,用于在至少一个所述候选区域中,对所述目标对象进行检测,得到检测结果。
- 根据权利要求11所述的装置,其特征在于,所述特征提取模块用于:对所述三维点云进行采样,得到第一采样点;在所述三维点云中构建以所述第一采样点为中心的采样区域;对所述采样区域进行特征提取,得到所述采样区域的特征向量;根据所述采样区域的特征向量,确定所述三维点云包括的三维点的特征向量,作为所述特征提取结果。
- 根据权利要求11或12所述的装置,其特征在于,所述候选区域确定模块用于:根据所述特征提取结果,对所述三维点云进行目标对象的类别预测,得到类别预测结果,其中,所述类别预测结果用于指示所述三维点云包括的三维点所属的目标对象的类别;根据所述特征提取结果,对所述三维点云进行目标对象的位置预测,得到位置预测结果,其中,所述位置预测结果用于指示所述三维点云中目标对象所在的三维点的位置;根据所述类别预测结果和所述位置预测结果,确定所述场景中包括所述目标对象的至少一个候选区域。
- 根据权利要求13所述的装置,其特征在于,所述候选区域确定模块进一步用于:将所述特征提取结果通过位置预测卷积网络进行处理,得到所述三维点云包括的三维点与至少一个预设检测框之间的残差量,其中,所述预设检测框的数量不少于一个;根据所述残差量,得到所述三维点匹配的至少一个检测框,作为所述位置预测结果。
- 根据权利要求14中所述的装置,其特征在于,所述位置预测卷积网络通过训练数据训练,所述训练数据包括三维点云样本、目标对象在所述三维点云样本中的第一位置以及与所述目标对象的类别对应的至少一个第一特征向量,所述训练包括:基于所述三维点云样本和初始位置预测卷积网络,得到第一位置预测结果;根据所述第一位置预测结果与所述第一位置之间的误差,得到第一误差损失;根据所述三维点云样本包括的三维点的特征向量,与所述第一特征向量之间的距离,得到第二误差损失;根据所述第一误差损失和/或第二误差损失,对所述初始位置预测卷积网络进行训练。
- 根据权利要求13至15中任意一项所述的装置,其特征在于,所述候选区域确定模块进一步用于:获取所述位置预测结果包括的至少一个检测框;根据所述检测框包括的三维点的类别预测结果,分别得到所述至少一个检测框的预测分数;将所述预测分数大于分数阈值的检测框,作为所述目标对象的候选区域。
- 根据权利要求11至16中任意一项所述的装置,其特征在于,所述检测模块用于:对第一候选区域包括的三维子点云进行采样,得到所述第一候选区域包括的第二采样点,其中,所述第一候选区域为所述至少一个候选区域中的任一个候选区域;根据所述第一候选区域包括的三维子点云的特征矩阵,获取所述第一候选区域包括的第二采样点的注意力特征向量;通过融合卷积网络,将所述第一候选区域包括的第二采样点的注意力特征向量进行融合,得到所述第一候选区域的特征融合结果;将所述第一候选区域的特征融合结果作为所述第一候选区域的检测结果。
- 根据权利要求17所述的装置,其特征在于,所述检测模块进一步用于:根据所述第一候选区域包括的三维子点云的特征矩阵,对所述第二采样点进行特征提取,得到所述第二采样点的初始特征向量;将所述第二采样点的初始特征向量进行平均池化,得到所述第一候选区域的全局特征向量;将所述第二采样点的初始特征向量与所述全局特征向量进行拼接,得到所述第二采样点的扩展特征向量;根据所述第二采样点的扩展特征向量,得到所述第二采样点的注意力系数;将所述第二采样点的注意力系数与所述第二采样点的初始特征向量进行相乘,得到所述第二采样点的注意力特征向量。
- 一种电子设备,其特征在于,包括:处理器;用于存储处理器可执行指令的存储器;其中,所述处理器被配置为调用所述存储器存储的指令,以执行权利要求1至10中任意一项所述的方法。
- 一种计算机可读存储介质,其上存储有计算机程序指令,其特征在于,所述计算机程序指令被处理器执行时实现权利要求1至10中任意一项所述的方法。
- 一种计算机程序,包括计算机可读代码,当所述计算机可读代码在电子设备中运行时,所述电子设备中的处理器执行用于实现权利要求1-10中的任一权利要求所述的方法。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2021537177A JP2022524262A (ja) | 2020-02-21 | 2021-01-12 | 目標対象物の検出方法、目標対象物の検出装置、電子機器、記憶媒体及びコンピュータプログラム |
KR1020217021886A KR20210114952A (ko) | 2020-02-21 | 2021-01-12 | 목표대상물의 검출방법, 장치, 기기 및 기억매체 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010108527.1A CN111340766B (zh) | 2020-02-21 | 2020-02-21 | 目标对象的检测方法、装置、设备和存储介质 |
CN202010108527.1 | 2020-02-21 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021164469A1 true WO2021164469A1 (zh) | 2021-08-26 |
Family
ID=71184254
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/071295 WO2021164469A1 (zh) | 2020-02-21 | 2021-01-12 | 目标对象的检测方法、装置、设备和存储介质 |
Country Status (4)
Country | Link |
---|---|
JP (1) | JP2022524262A (zh) |
KR (1) | KR20210114952A (zh) |
CN (1) | CN111340766B (zh) |
WO (1) | WO2021164469A1 (zh) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113761238A (zh) * | 2021-08-27 | 2021-12-07 | 广州文远知行科技有限公司 | 点云存储方法、装置、设备及存储介质 |
CN113988164A (zh) * | 2021-10-21 | 2022-01-28 | 电子科技大学 | 一种面向代表点自注意力机制的轻量级点云目标检测方法 |
CN114022523A (zh) * | 2021-10-09 | 2022-02-08 | 清华大学 | 低重叠点云数据配准系统及方法 |
CN115273154A (zh) * | 2022-09-26 | 2022-11-01 | 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) | 基于边缘重构的热红外行人检测方法、系统及存储介质 |
WO2023202401A1 (zh) * | 2022-04-19 | 2023-10-26 | 京东科技信息技术有限公司 | 点云数据中目标的检测方法、装置和计算机可读存储介质 |
CN117789057A (zh) * | 2022-09-19 | 2024-03-29 | 中国矿业大学(北京) | 一种基于局部-全局特征的无人机图像旋转目标检测方法 |
WO2024139537A1 (zh) * | 2022-12-29 | 2024-07-04 | 北京图森智途科技有限公司 | 三维目标检测方法、装置和计算机可读存储介质 |
Families Citing this family (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111340766B (zh) * | 2020-02-21 | 2024-06-11 | 北京市商汤科技开发有限公司 | 目标对象的检测方法、装置、设备和存储介质 |
CN111814674A (zh) * | 2020-07-08 | 2020-10-23 | 上海雪湖科技有限公司 | 基于fpga的点云网络的非极大值抑制方法 |
CN111862222B (zh) * | 2020-08-03 | 2021-08-13 | 湖北亿咖通科技有限公司 | 一种目标检测方法及电子设备 |
CN114667728B (zh) * | 2020-12-31 | 2023-10-13 | 深圳市大疆创新科技有限公司 | 点云编解码方法、装置及系统 |
CN115035359A (zh) * | 2021-02-24 | 2022-09-09 | 华为技术有限公司 | 一种点云数据处理方法、训练数据处理方法及装置 |
CN112801036A (zh) * | 2021-02-25 | 2021-05-14 | 同济大学 | 一种目标识别方法、训练方法、介质、电子设备及汽车 |
CN114973231A (zh) * | 2021-02-25 | 2022-08-30 | 微软技术许可有限责任公司 | 三维对象检测 |
CN112883979A (zh) * | 2021-03-11 | 2021-06-01 | 先临三维科技股份有限公司 | 三维实例分割方法、装置、设备和计算机可读存储介质 |
CN113052031B (zh) * | 2021-03-15 | 2022-08-09 | 浙江大学 | 一种无需后处理操作的3d目标检测方法 |
CN114061586B (zh) * | 2021-11-10 | 2024-08-16 | 北京有竹居网络技术有限公司 | 用于生成电子设备的导航路径的方法和产品 |
KR102405818B1 (ko) * | 2021-11-15 | 2022-06-07 | 국방과학연구소 | 노이즈 제거 방법, 노이즈 제거 장치 및 상기 방법을 실행시키기 위하여 기록매체에 저장된 컴퓨터 프로그램 |
CN114750147B (zh) * | 2022-03-10 | 2023-11-24 | 深圳甲壳虫智能有限公司 | 机器人的空间位姿确定方法、装置和机器人 |
WO2024095380A1 (ja) * | 2022-11-02 | 2024-05-10 | 三菱電機株式会社 | 点群識別装置、学習装置、点群識別方法、および、学習方法 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160379050A1 (en) * | 2015-06-26 | 2016-12-29 | Kabushiki Kaisha Toshiba | Method for determining authenticity of a three-dimensional object |
CN109410307A (zh) * | 2018-10-16 | 2019-03-01 | 大连理工大学 | 一种场景点云语义分割方法 |
CN110032962A (zh) * | 2019-04-03 | 2019-07-19 | 腾讯科技(深圳)有限公司 | 一种物体检测方法、装置、网络设备和存储介质 |
CN110059608A (zh) * | 2019-04-11 | 2019-07-26 | 腾讯科技(深圳)有限公司 | 一种物体检测方法、装置、电子设备和存储介质 |
CN110443842A (zh) * | 2019-07-24 | 2019-11-12 | 大连理工大学 | 基于视角融合的深度图预测方法 |
CN111340766A (zh) * | 2020-02-21 | 2020-06-26 | 北京市商汤科技开发有限公司 | 目标对象的检测方法、装置、设备和存储介质 |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2018072198A (ja) * | 2016-10-31 | 2018-05-10 | 富士通株式会社 | 位置姿勢推定装置、位置姿勢推定方法、及び位置姿勢推定プログラム |
JP6687568B2 (ja) * | 2017-07-14 | 2020-04-22 | エーティーラボ株式会社 | 境界推定装置、境界推定方法および境界推定プログラム |
CN109345510A (zh) * | 2018-09-07 | 2019-02-15 | 百度在线网络技术(北京)有限公司 | 物体检测方法、装置、设备、存储介质及车辆 |
CN110298361B (zh) * | 2019-05-22 | 2021-05-04 | 杭州未名信科科技有限公司 | 一种rgb-d图像的语义分割方法和系统 |
CN110400304B (zh) * | 2019-07-25 | 2023-12-12 | 腾讯科技(深圳)有限公司 | 基于深度学习的物体检测方法、装置、设备及存储介质 |
-
2020
- 2020-02-21 CN CN202010108527.1A patent/CN111340766B/zh active Active
-
2021
- 2021-01-12 WO PCT/CN2021/071295 patent/WO2021164469A1/zh active Application Filing
- 2021-01-12 JP JP2021537177A patent/JP2022524262A/ja active Pending
- 2021-01-12 KR KR1020217021886A patent/KR20210114952A/ko not_active Application Discontinuation
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160379050A1 (en) * | 2015-06-26 | 2016-12-29 | Kabushiki Kaisha Toshiba | Method for determining authenticity of a three-dimensional object |
CN109410307A (zh) * | 2018-10-16 | 2019-03-01 | 大连理工大学 | 一种场景点云语义分割方法 |
CN110032962A (zh) * | 2019-04-03 | 2019-07-19 | 腾讯科技(深圳)有限公司 | 一种物体检测方法、装置、网络设备和存储介质 |
CN110059608A (zh) * | 2019-04-11 | 2019-07-26 | 腾讯科技(深圳)有限公司 | 一种物体检测方法、装置、电子设备和存储介质 |
CN110443842A (zh) * | 2019-07-24 | 2019-11-12 | 大连理工大学 | 基于视角融合的深度图预测方法 |
CN111340766A (zh) * | 2020-02-21 | 2020-06-26 | 北京市商汤科技开发有限公司 | 目标对象的检测方法、装置、设备和存储介质 |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113761238A (zh) * | 2021-08-27 | 2021-12-07 | 广州文远知行科技有限公司 | 点云存储方法、装置、设备及存储介质 |
CN113761238B (zh) * | 2021-08-27 | 2022-08-23 | 广州文远知行科技有限公司 | 点云存储方法、装置、设备及存储介质 |
CN114022523A (zh) * | 2021-10-09 | 2022-02-08 | 清华大学 | 低重叠点云数据配准系统及方法 |
CN113988164A (zh) * | 2021-10-21 | 2022-01-28 | 电子科技大学 | 一种面向代表点自注意力机制的轻量级点云目标检测方法 |
CN113988164B (zh) * | 2021-10-21 | 2023-08-08 | 电子科技大学 | 一种面向代表点自注意力机制的轻量级点云目标检测方法 |
WO2023202401A1 (zh) * | 2022-04-19 | 2023-10-26 | 京东科技信息技术有限公司 | 点云数据中目标的检测方法、装置和计算机可读存储介质 |
CN117789057A (zh) * | 2022-09-19 | 2024-03-29 | 中国矿业大学(北京) | 一种基于局部-全局特征的无人机图像旋转目标检测方法 |
CN115273154A (zh) * | 2022-09-26 | 2022-11-01 | 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) | 基于边缘重构的热红外行人检测方法、系统及存储介质 |
WO2024139537A1 (zh) * | 2022-12-29 | 2024-07-04 | 北京图森智途科技有限公司 | 三维目标检测方法、装置和计算机可读存储介质 |
Also Published As
Publication number | Publication date |
---|---|
KR20210114952A (ko) | 2021-09-24 |
CN111340766B (zh) | 2024-06-11 |
CN111340766A (zh) | 2020-06-26 |
JP2022524262A (ja) | 2022-05-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021164469A1 (zh) | 目标对象的检测方法、装置、设备和存储介质 | |
TWI724736B (zh) | 圖像處理方法及裝置、電子設備、儲存媒體和電腦程式 | |
TWI749423B (zh) | 圖像處理方法及裝置、電子設備和電腦可讀儲存介質 | |
WO2021051857A1 (zh) | 目标对象匹配方法及装置、电子设备和存储介质 | |
WO2020135529A1 (zh) | 位姿估计方法及装置、电子设备和存储介质 | |
WO2021155632A1 (zh) | 图像处理方法及装置、电子设备和存储介质 | |
TW202141423A (zh) | 圖像處理方法及電子設備和電腦可讀儲存介質 | |
CN109145150B (zh) | 目标匹配方法及装置、电子设备和存储介质 | |
WO2021035833A1 (zh) | 姿态预测方法、模型训练方法及装置 | |
TWI735112B (zh) | 圖像生成方法、電子設備和儲存介質 | |
TWI778313B (zh) | 圖像處理方法、電子設備和儲存介質 | |
WO2020155713A1 (zh) | 图像处理方法及装置、网络训练方法及装置 | |
CN112906484B (zh) | 一种视频帧处理方法及装置、电子设备和存储介质 | |
JP7114811B2 (ja) | 画像処理方法及び装置、電子機器並びに記憶媒体 | |
CN111259967A (zh) | 图像分类及神经网络训练方法、装置、设备及存储介质 | |
KR20220027202A (ko) | 객체 검출 방법 및 장치, 전자 기기 및 저장매체 | |
CN111311588B (zh) | 重定位方法及装置、电子设备和存储介质 | |
CN113781518A (zh) | 神经网络结构搜索方法及装置、电子设备和存储介质 | |
CN115035596B (zh) | 行为检测的方法及装置、电子设备和存储介质 | |
CN111062407A (zh) | 图像处理方法及装置、电子设备和存储介质 | |
US20210350170A1 (en) | Localization method and apparatus based on shared map, electronic device and storage medium | |
CN113807369B (zh) | 目标重识别方法及装置、电子设备和存储介质 | |
CN111723715B (zh) | 一种视频显著性检测方法及装置、电子设备和存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
ENP | Entry into the national phase |
Ref document number: 2021537177 Country of ref document: JP Kind code of ref document: A |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21757677 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21757677 Country of ref document: EP Kind code of ref document: A1 |