CN111539347B

CN111539347B - Method and device for detecting target

Info

Publication number: CN111539347B
Application number: CN202010342481.XA
Authority: CN
Inventors: 叶晓青; 谭啸; 孙昊; 章宏武
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-04-27
Filing date: 2020-04-27
Publication date: 2023-08-08
Anticipated expiration: 2040-04-27
Also published as: CN111539347A

Abstract

The application discloses a method and a device for detecting targets, and relates to the technical field of computers. The specific implementation mode is as follows: inputting the obtained point cloud into a target detection model to obtain a target detection result, wherein the target detection model comprises an expansion convolution layer for processing characteristic information output by a characteristic extraction network, and the target detection model is obtained through training the following steps: inputting sample point clouds of samples in a sample set into an initial model to obtain first characteristic information output by an expansion convolution layer; obtaining second characteristic information based on a generated sample corresponding to the input sample in the generated set and a characteristic extraction network of a pre-trained detection model; determining a loss function value based on the first feature information and the second feature information; and updating model parameters of the initial model by using the loss function value to obtain a target detection model. According to the embodiment, the target detection model has good detection effect even under the condition that the number of points of the point cloud is small, and a more accurate target detection result is output.

Description

Method and device for detecting target

Technical Field

The embodiment of the disclosure relates to the technical field of computers, in particular to a target detection technology of computer vision.

Background

As an important environmental sensor, lidar is widely used in various fields. Taking the field of vehicle autopilot as an example, the point cloud data acquired by the laser radar includes information of various targets (such as vehicles, pedestrians, riders, etc.), and can be used for obstacle detection. Because the point clouds acquired by the laser radar have sparsity and density inconsistency, incomplete point cloud deletion or too few point numbers of the point clouds can occur in the shielding area and the remote position. This results in lower accuracy of target detection in occlusion and remote areas, with susceptibility to missed and false detection.

Disclosure of Invention

A method and apparatus for detecting a target are provided.

According to a first aspect, embodiments of the present disclosure provide a method for detecting a target, the method comprising: inputting the obtained point cloud into a pre-established target detection model to obtain a target detection result, wherein the target detection model comprises an expansion convolution layer for processing characteristic information output by a characteristic extraction network, and the target detection model is obtained through training by the following steps: inputting sample point clouds of samples in a sample set into an initial model to obtain first characteristic information output by an expansion convolution layer; obtaining second characteristic information based on a generated sample corresponding to an input sample in a generated set and a characteristic extraction network of a pre-trained detection model, wherein the generated sample in the generated set has a corresponding relation with a sample in the sample set; determining a loss function value based on the first feature information and the second feature information; and updating the model parameters of the initial model by using the loss function value to obtain a target detection model.

According to a second aspect, embodiments of the present disclosure provide an apparatus for detecting a target, the apparatus comprising: the input unit is configured to input the acquired point cloud into a pre-established target detection model to obtain a target detection result, wherein the target detection model comprises an expansion convolution layer for processing characteristic information output by a characteristic extraction network, and the target detection model is obtained through training by the following units: the first generation unit is configured to input a sample point cloud of a sample in the sample set into the initial model to obtain first characteristic information output by the expansion convolution layer; a second generation unit configured to obtain second feature information based on a feature extraction network of a generation sample corresponding to an input sample in a generation set and a pre-trained detection model, wherein the generation sample in the generation set has a correspondence with a sample in the sample set; a determining unit configured to determine a loss function value based on the first feature information and the second feature information; and a parameter updating unit configured to update model parameters of the initial model by using the loss function value to obtain a target detection model.

According to a third aspect, an embodiment of the present disclosure provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of the first aspects.

According to a fourth aspect, an embodiment of the present disclosure provides a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method according to any one of the first aspects.

The feature extraction network and the expansion convolution layer of the target detection model used according to the technology are obtained based on training of a pre-trained detection model, the expansion convolution can further process the feature information output by the feature extraction network, the processed feature information can be more similar to the feature information of the point multiple areas, and therefore the target detection model can have good detection effect even for the condition of less point number of point clouds, and more accurate target detection results are output.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for better understanding of the present solution and do not constitute a limitation of the present application. Wherein:

FIG. 1 is a flow chart of one embodiment of a method for detecting a target according to the present disclosure;

FIG. 2 is a schematic illustration of one application scenario of a method for detecting a target according to the present disclosure;

FIG. 3 is a flow chart of yet another embodiment of a method for detecting a target according to the present disclosure;

FIG. 4 is a schematic structural view of one embodiment of an apparatus for detecting a target according to the present disclosure;

FIG. 5 is a block diagram of an electronic device for implementing a method for detecting an object of an embodiment of the present application

Detailed Description

Exemplary embodiments of the present application are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present application to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

FIG. 1 illustrates a flow 100 of one embodiment of a method for detecting a target according to the present disclosure. The method for detecting an object comprises the steps of:

s101, inputting the obtained point cloud into a pre-established target detection model to obtain a target detection result.

In the present embodiment, the execution subject of the method for detecting an object may acquire the point cloud acquired by the point cloud acquisition apparatus (e.g., laser radar, three-dimensional laser scanner, etc.) through a wired connection manner and a wireless connection manner. Then, the executing body may input the obtained point cloud into a target detection model established in advance, thereby obtaining a target detection result. Here, the above-described object detection model may be used to characterize the correspondence between the point cloud and the object detection result. Here, object detection may be used to identify and locate a particular object (i.e., object) in the point cloud. The target detection result may include a category of the target, bounding box information of the target, wherein the bounding box information of the target may include a length, a width, a height, an orientation angle, a center point coordinate, and the like of the bounding box.

In general, a point cloud acquisition device may acquire a point cloud for a scene of the physical world (e.g., a road scene). The point cloud may include a plurality of point data, which may include three-dimensional coordinates and laser reflection intensity. In general, the three-dimensional coordinates of the point data may include information on the X-axis, Y-axis, and Z-axis. Here, the laser reflection intensity may refer to a ratio of laser reflection energy to laser emission energy.

Here, the execution subject may be various electronic devices having a point cloud data processing function. Including but not limited to: smart phones, tablet computers, laptop portable computers, desktop computers, and car terminals, among others.

In the present embodiment, the target detection model may be obtained by training the execution subject for training the target detection model in the following steps S1011 to S1014.

S1011, inputting the sample point cloud of the samples in the sample set into the initial model to obtain first characteristic information output by the expansion convolution layer.

In the present embodiment, an execution subject for training a target detection model may first acquire a sample set. Wherein, each sample in the sample set may include a sample point cloud and a sample target detection result corresponding to the sample point cloud. As an example, the sample point cloud in the sample may be a point cloud of a real scene of the lidar acquisition physical world, and the sample point cloud may include a point cloud of a target and a point cloud of a non-target. The point cloud of the target in a portion of the sample point cloud may be sparse due to occlusion, distance, and the like. Here, the sample target detection result may include a category of the target, bounding box information of the target, wherein the bounding box information of the target may include a length, a width, a height, an orientation angle, a center point coordinate, and the like of the bounding box. For example, the sample in the sample set may be obtained by manually labeling a point cloud acquired by a laser radar by a technician.

Then, the execution body for training the target detection model may input the sample point cloud of the samples in the sample set into the initial model, so as to obtain first feature information output by the expansion convolution layer. Here, the initial model may refer to an untrained or untrained completed target detection model. Here, the network structure of the initial model may be determined according to actual needs. For example, it is determined which layers the initial model includes (e.g., convolution layer, pooling layer, excitation function layer, fully connected layer, etc.), the connection order relationship between layers, and which parameters each layer includes (e.g., weights, bias terms, step sizes of convolutions, etc.), as desired.

As an example, the object detection model may include a feature extraction network, an expanded convolution layer, and a result output layer. The feature extraction network may be used to perform feature extraction on the received point cloud, so as to obtain feature information. The expansion convolution layer can be used for further processing the characteristic information output by the characteristic extraction network, and the characteristic information is obtained after the processing. The result output layer can output a target detection result according to the characteristic information output by the expansion convolution layer. For example, the feature extraction network may be a sparse convolutional neural network, which may reduce unnecessary resource consumption due to sparsity of the point cloud. The dilation convolution (or hole convolution) is to inject holes based on the standard convolution to increase the receptive field and make the output contain a larger range of information.

The execution subject of the method for detecting the target may be the same as or different from the execution subject for training the target detection model. If the network structure information and the network parameter values are the same, the execution subject for training the target detection model can store the network structure information and the network parameter values of the trained model to the local after training the target detection model; if it is different, the execution subject for training the target detection model may transmit the network structure information of the trained model and the parameter values of the network parameters to the execution subject of the method for detecting a target after training the target detection model.

And S1012, obtaining second characteristic information based on the generated samples corresponding to the input samples in the generated set and the characteristic extraction network of the pre-trained detection model.

In this embodiment, a generation set may be stored in advance in an execution body for training the target detection model, and generating samples in the generation set may include generating a sample point cloud. The generated samples in the generated set may have a correspondence with the samples in the sample set. For example, there may be a one-to-one correspondence between the generated samples in the generated set and the samples in the sample set. In this way, the execution subject can determine a generated sample corresponding to the input sample in S1011 from the generated set, and input the generated sample point cloud of the determined generated sample to the feature extraction network of the detection model trained in advance, thereby obtaining the second feature information. Here, the detection model may include a feature extraction network and a result output layer, where the feature extraction network may be used to perform feature extraction on an input point cloud to generate feature information. The result output layer may be configured to output a target detection result according to the feature information output by the feature extraction network.

As an example, the pre-trained detection model may be obtained based on point cloud training of a scene where the point cloud of the included target has a large number of points (e.g., the number of points is greater than a preset number threshold). The point data of the point cloud of the target in the generated sample point cloud of the generated sample in the generated set is more (for example, the point number is greater than a preset number threshold).

In some optional implementations of this embodiment, the network structure of the feature extraction network of the detection model and the network structure of the feature extraction network of the initial model may be the same, that is, include the same layers, and the connection order between the layers is the same. However, the parameter values of the respective layer parameters may be different. By the implementation mode, the feature extraction network of the detection model and the feature extraction network of the initial model have the same network structure, so that the construction process of the network structure of the feature extraction network is simplified.

S1013, a loss function value is determined based on the first feature information and the second feature information.

In the present embodiment, the execution body for training the target detection model may determine the loss function value from the first feature information and the second feature information obtained in S1011 and S1012. As an example, the execution subject may calculate a distance between the first feature information and the second feature information, for example, a euclidean distance, and take the distance as the loss function value.

In some optional implementations of this embodiment, the samples in the sample set may include a sample point cloud and a sample target detection result corresponding to the sample point cloud. The above S1013 may be specifically performed as follows:

first, a distance between the first feature information and the second feature information is calculated, and a first loss function value is determined based on the distance.

In this implementation, a distance, for example, a euclidean distance, between the first feature information and the second feature information may be calculated, and the first loss function value may be determined according to the distance. As an example, the distance may be directly taken as the first loss function value. As another example, the first loss function value may be generated from the distance and the position feature corresponding to the target point cloud.

For example, a first loss function value L _transfer The method can be calculated by the following formula:

L _transfer ＝‖F _perceptual -F _conceptual ‖ ₂ *M _foreground

wherein F is _perceptual Representing first characteristic information; f (F) _conceptual Representing second characteristic information; m is M _foreground Representing a mask (mask), wherein the position corresponding to the target point cloud of the target in the mask is 1, and the position outside the target point cloud is 0.

And secondly, generating a second loss function value based on target class information output by the initial model aiming at the input sample point cloud and target class information in a sample target detection result corresponding to the input sample.

In this implementation, for an input sample point cloud, the initial model may output a predicted target detection result, which may include target class information, target bounding box information, and so on. In this way, the execution subject may generate the second loss function value according to the target class information output by the initial model for the input sample point cloud and the target class information in the sample target detection result corresponding to the input sample.

Then, third loss function values are generated based on bounding box information output by the initial model for the input sample point cloud and bounding box information in the sample target detection result corresponding to the input sample.

In this implementation manner, the third loss function value may be generated according to bounding box information output by the initial model for the input sample point cloud and bounding box information in the sample target detection result corresponding to the input sample.

Finally, generating a loss function value according to the first loss function value, the second loss function value and the third loss function value,

in this implementation, the loss function value may be generated from the first loss function value, the second loss function value, and the third loss function value. For example, the sum of the first, second, and third loss function values may be used as the loss function value. Namely:

L＝L _class +L _bbox +L _transfer

Wherein L is _transfer Represents a first loss function value, L _class Represents a second loss function value, L _bbox The third loss function value is represented by L. In the present embodiment, the loss function value comprehensively considers the loss of the feature information, the loss of the category information, and the loss of the bounding box information in the calculation. Therefore, the loss of various information can be comprehensively considered during model training, and the output of the target detection model obtained through training can be more accurate.

S1014, updating the model parameters of the initial model by using the loss function value to obtain a target detection model.

In this embodiment, the execution body for training the target detection model may update the model parameters of the initial model with the loss function value obtained in S1013 to obtain the target detection model. As an example, the loss function values may be back-propagated and the model parameters of the initial model described above may be adjusted using a back-propagation algorithm (Back Propgation Algorithm, BP algorithm) and a gradient descent method (e.g., a small batch gradient descent algorithm). It should be noted that the back propagation algorithm and the gradient descent method are well known techniques widely studied and applied at present, and will not be described herein.

With continued reference to fig. 2, fig. 2 is a schematic diagram of an application scenario of the method for detecting a target according to the present embodiment. In the application scenario of fig. 2, the terminal device inputs the acquired point cloud into a pre-established target detection model, and a target detection result is obtained. The target detection model comprises an expansion convolution layer for processing the characteristic information output by the characteristic extraction network, and is obtained by training an execution main body for training the target detection model through the following steps: firstly, inputting a sample point cloud of a sample in a sample set into an initial model to obtain first characteristic information output by an expansion convolution layer. And obtaining second characteristic information based on the generated samples corresponding to the input samples in the generated set and the characteristic extraction network of the pre-trained detection model, wherein the generated samples in the generated set have a corresponding relation with the samples in the sample set. Then, a loss function value is determined based on the first characteristic information and the second characteristic information. And finally, updating model parameters of the initial model by using the loss function value to obtain a target detection model.

The feature extraction network and the expansion convolution layer of the target detection model used by the method provided by the embodiment of the disclosure are obtained based on training of a pre-trained detection model, the expansion convolution can further process the feature information output by the feature extraction network, and the processed feature information can be more similar to the feature information of the point multiple areas, so that the target detection model can have good detection effect even for the condition of less point number of the point cloud, and more accurate target detection results are output.

With further reference to FIG. 3, a flow chart 300 of an embodiment of a method for deriving a generation set is shown. The method 300 for deriving a generation set comprises the steps of:

s301, extracting target point clouds of different types of targets from sample point clouds of samples.

In the present embodiment, the execution subject of the method for obtaining the generation set, the execution subject of the method for detecting the target, and the execution subject of the method for training the target detection model may be the same or different. The execution subject of the method for obtaining the generation set may extract the target point clouds of different kinds of targets from the sample point clouds of the respective samples of the sample set. Here, the target point cloud may refer to a point cloud for describing a target. As an example, point data included in the sample point cloud may be segmented and identified, and a target point cloud for describing various targets in the sample point cloud may be extracted.

S302, dividing the target point clouds of each category into at least one target point cloud subset according to the orientation angle.

In the present embodiment, the target point cloud may be subjected to coordinate system transformation, for example, from a radar coordinate system to a camera coordinate system. Then, an orientation angle in the camera coordinate system is determined, where the orientation angle in the camera coordinate system may refer to an angle of an object (e.g., a vehicle, a pedestrian, a rider, etc.) heading with respect to the camera X-axis. In this way, the target point clouds of each category may be divided into at least one target point cloud subset according to the angle of orientation. As an example, the angle section may be divided first, and then, the target point clouds whose orientation angles are in the same angle section may be divided into the same target point cloud subset.

S303, dividing the target point cloud in each target point cloud subset into a first type of target point cloud and a second type of target point cloud according to the contained points.

In the present embodiment, the target point clouds in the respective target point cloud subsets may be divided into a first type of target point cloud and a second type of target point cloud according to the number of points included (i.e., the number of point data). The points contained in the first type of target point cloud are more than the points contained in the second type of target point cloud. As an example, for each target point cloud in the same target point cloud subset, the target point clouds ranked in the order of from large to small in the number of points included may be classified into first-class target point clouds, and target point clouds other than the first-class target point clouds may be classified into second-class target point clouds. As another example, a threshold value of the contained points may be set, a target point cloud having the contained points greater than or equal to the threshold value is determined as a first type of target point cloud, and a target point cloud having the contained points less than the threshold value is determined as a second type of target point cloud.

S304, obtaining a generation set based on the first type target point cloud and the second type target point cloud in each target point cloud subset.

In this embodiment, the generation set may be obtained according to the first type target point cloud and the second type target point cloud in each target point cloud subset. As an example, a sample including a first type of target point cloud in a sample set may be taken as a generation sample, resulting in a generation set. And establishing a corresponding relation between the generated samples in the generated set and the samples in the sample set. For example, a technician establishes a correspondence between the generated sample and the sample according to the generated sample and the similarity of the scene for which the sample is aimed. For example, a correspondence is established between generated samples and samples for scene similarity.

In some alternative implementations of the present embodiment, S304 may specifically be performed as follows:

1) In response to determining that the samples in the sample set contain the second type of target point cloud, replacing the second type of target point cloud contained in the samples based on the first type of target point cloud in the target point cloud subset to which the second type of target point cloud contained in the samples belongs, and obtaining a generated sample.

In this implementation manner, for each sample in the sample set, it may be determined whether the sample point cloud of the sample includes the second type target point cloud, and if so, the second type target point cloud included in the sample may be replaced based on the first type target point cloud in the target point cloud subset to which the second type target point cloud included in the sample belongs, for example, one first type target point cloud randomly selected from the target point cloud subset to which the second type target point cloud belongs, and the replaced sample is used as the generated sample corresponding to the sample.

2) In response to determining that the samples in the sample set do not contain the second type of target point cloud, the samples are taken as generated samples.

In this implementation manner, for each sample in the sample set, if the sample point cloud of the sample does not include the second type of target point cloud, the sample may be directly used as a generated sample in the generation set corresponding to the sample. Through the implementation manner, the cloud of the target points of the generated samples in the generation set can be guaranteed to have more points.

In some optional implementations, the replacing the second class target point cloud contained in the sample based on the first class target point cloud in the target point cloud subset to which the second class target point cloud contained in the sample in 1) belongs to obtain the generated sample may specifically be performed as follows:

first, a distance between a second type target point cloud contained in a sample and a first type target point cloud in a target point cloud subset is calculated, and a replacement target point cloud is determined from the first type target point cloud based on a calculation result.

In this implementation manner, if it is determined that the sample in the sample set includes the second type target point cloud, a distance between the second type target point cloud and each first type target point cloud in the target point cloud subset to which the second type target point cloud belongs, for example, an average point distance, may be calculated, and the replacement target point cloud may be determined from the first type target point clouds based on the calculation result. For example, a first type of target point cloud having a minimum average point distance from the second type of target point cloud may be selected as the replacement target point cloud.

And then, replacing the second type target point cloud contained in the sample according to the target point cloud for replacement.

In this implementation manner, the second type target point cloud included in the sample may be replaced according to the replacement target point cloud. For example, the second type of target point cloud contained in the sample may be directly replaced with the replacement target point cloud. In this implementation manner, by selecting the replacement target point cloud from the first type target point clouds of the target points Yun Ziji to which the second type target point cloud belongs by calculating the distance, the difference between the replacement target point cloud and the replaced second type target point cloud can be made as small as possible, so that the generated sample in the generated sample set is closer to the sample in the sample set.

Optionally, the replacing the second class of target point cloud included in the sample according to the replacing target point cloud may be specifically performed as follows:

first, the replacement target point cloud is rotated according to the orientation angle of the second type target point cloud included in the sample.

In this implementation, the replacement target point cloud may be rotated according to an orientation angle of the second type target point cloud included in the sample. That is, the orientation angle of the target point cloud for replacement is rotated to the same angle as the target point cloud of the second type that is replaced. In addition, the scale of the target point cloud for replacement can be adjusted, so that the scale identical to that of the target point cloud of the second type to be replaced is ensured.

Then, the second type target point cloud contained in the sample is replaced by using the rotated replacement target point cloud.

In this implementation, the second type of target point cloud included in the sample may be replaced with the rotated replacement target point cloud. According to the method and the device, the direction angles of the target point cloud for replacement and the target point cloud of the second type to be replaced are consistent, so that the generated sample in the generation set is more similar to the sample in the sample set, and the generated target detection model is more accurate.

The method for obtaining the generation set provided by the embodiment of the invention can obtain the generation set based on the sample set, thereby ensuring the similarity between the generation sample in the generation set and the sample in the sample set, and simultaneously ensuring that the number of points of the target point cloud in the generation sample in the generation set is more, so that the feature extraction network of the pre-trained detection model can extract more abundant feature information, and further the target detection model obtained by training is more accurate.

With further reference to fig. 4, as an implementation of the method shown in the above figures, the present disclosure provides an embodiment of an apparatus for detecting a target, which corresponds to the method embodiment shown in fig. 1, and which is particularly applicable to various electronic devices.

As shown in fig. 4, the apparatus 400 for detecting an object of the present embodiment includes an input unit 401. The input unit 401 is configured to input the acquired point cloud into a pre-established target detection model to obtain a target detection result, where the target detection model includes an expansion convolution layer for processing feature information output by a feature extraction network, and the target detection model may be obtained by training the device for training the target detection model through the following units: a first generating unit 4011 configured to input a sample point cloud of samples in the sample set into the initial model, and obtain first feature information output by the expansion convolution layer; a second generation unit 4012 configured to obtain second feature information based on a feature extraction network of a generation sample corresponding to an input sample in a generation set and a pre-trained detection model, wherein the generation sample in the generation set has a correspondence with a sample in the sample set; a determination unit 4013 configured to determine a loss function value based on the first characteristic information and the second characteristic information; the parameter updating unit 4014 is configured to update the model parameter of the initial model by using the loss function value, and obtain a target detection model.

In this embodiment, the specific processing of the input unit 401 of the apparatus 400 for detecting a target and the technical effects thereof may refer to the description of S101 in the corresponding embodiment of fig. 1, and are not described herein.

In some optional implementations of this embodiment, the above-described generation set is generated by: an extraction unit (not shown in the figure) configured to extract target point clouds of different kinds of targets from sample point clouds of a sample; a first dividing unit (not shown in the figure) configured to divide the target point clouds of the respective categories into at least one target point cloud subset according to the orientation angle; a second dividing unit (not shown in the figure) configured to divide the target point clouds in each target point cloud subset into a first type of target point cloud and a second type of target point cloud according to the number of points included, wherein the first type of target point cloud includes more points than the second type of target point cloud; a generation set generation unit (not shown in the figure) is configured to obtain a generation set based on the first type target point cloud and the second type target point cloud in each target point cloud subset.

In some optional implementations of this embodiment, the generating set generating unit includes: a replacing unit (not shown in the figure) configured to replace, in response to determining that the samples in the sample set contain the second type of target point cloud, the second type of target point cloud contained in the samples based on the first type of target point cloud in the target point cloud subset to which the second type of target point cloud contained in the samples belongs, to obtain a generated sample; a generating subunit (not shown in the figure) configured to take the sample as a generated sample in response to determining that the samples in the sample set do not contain the second type of target point cloud.

In some optional implementations of this embodiment, the replacing unit includes: a calculating module (not shown in the figure) configured to calculate a distance between a second type of target point cloud contained in the sample and a first type of target point cloud in the belonging target point cloud subset, and determine a replacement target point cloud from the first type of target point clouds based on a calculation result; a replacing sub-module (not shown in the figure) configured to replace the second type of target point cloud contained in the sample according to the replacing target point cloud.

In some optional implementations of this embodiment, the replacement sub-module is further configured to: rotating the target point cloud for replacement according to the direction angle of the second type of target point cloud contained in the sample; and replacing the second type of target point cloud contained in the sample by using the rotated target point cloud for replacement.

In some optional implementations of this embodiment, the feature extraction network of the detection model is the same as the network structure of the feature extraction network of the initial model.

In some optional implementations of this embodiment, the samples in the sample set include a sample point cloud and a sample target detection result corresponding to the sample point cloud; and the above-described determination unit 4013 is further configured to: calculating the distance between the first characteristic information and the second characteristic information, and determining a first loss function value according to the distance; generating a second loss function value based on target class information output by the initial model aiming at the input sample point cloud and target class information in a sample target detection result corresponding to the input sample; generating a third loss function value based on bounding box information output by the initial model aiming at the input sample point cloud and bounding box information in a sample target detection result corresponding to the input sample; and generating a loss function value according to the first loss function value, the second loss function value and the third loss function value.

According to embodiments of the present application, an electronic device and a readable storage medium are also provided.

As shown in fig. 5, is a block diagram of an electronic device for a method of detecting a target according to an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the application described and/or claimed herein.

As shown in fig. 5, the electronic device includes: one or more processors 501, memory 502, and interfaces for connecting components, including high-speed interfaces and low-speed interfaces. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the electronic device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface. In other embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. Also, multiple electronic devices may be connected, each providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). One processor 501 is illustrated in fig. 5.

Memory 502 is a non-transitory computer readable storage medium provided herein. Wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the methods for detecting an object provided herein. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the method for detecting a target provided by the present application.

The memory 502, which is a non-transitory computer-readable storage medium, may be used to store a non-transitory software program, a non-transitory computer-executable program, and modules, such as program instructions/modules (e.g., the input unit 401, the first generation unit 4011, the second generation unit 4012, the determination unit 4013, and the parameter updating unit 4014 shown in fig. 4) corresponding to the method for detecting an object in the embodiment of the present application. The processor 501 executes various functional applications of the server and data processing, i.e., implements the method for detecting an object in the above-described method embodiment, by running non-transitory software programs, instructions, and modules stored in the memory 502.

Memory 502 may include a storage program area that may store an operating system, at least one application program required for functionality, and a storage data area; the storage data area may store data created according to the use of the electronic device for detecting the target, or the like. In addition, memory 502 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, memory 502 may optionally include memory located remotely from processor 501, which may be connected to the electronic device for detecting the target via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device for the method of detecting an object may further include: an input device 503 and an output device 504. The processor 501, memory 502, input devices 503 and output devices 504 may be connected by a bus or otherwise, for example in fig. 5.

The input device 503 may receive input numeric or character information and generate key signal inputs related to user settings and function controls of the electronic device for detecting objects, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointer stick, one or more mouse buttons, a track ball, a joystick, etc. The output devices 504 may include a display device, auxiliary lighting devices (e.g., LEDs), and haptic feedback devices (e.g., vibration motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device may be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASIC (application specific integrated circuit), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

These computing programs (also referred to as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

According to the technical scheme of the embodiment of the application, the feature extraction network and the expansion convolution layer of the target detection model are obtained based on training of the pre-trained detection model, the expansion convolution can further process the feature information output by the feature extraction network, the processed feature information can be more similar to the feature information of the point multiple areas, therefore, the target detection model can have good detection effect even for the condition of less point number of the point cloud, and more accurate target detection results are output.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present application may be performed in parallel, sequentially, or in a different order, provided that the desired results of the technical solutions disclosed in the present application can be achieved, and are not limited herein.

The above embodiments do not limit the scope of the application. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present application are intended to be included within the scope of the present application.

Claims

1. A method for detecting an object, comprising:

inputting the obtained point cloud into a pre-established target detection model to obtain a target detection result, wherein the target detection model comprises an expansion convolution layer for processing characteristic information output by a characteristic extraction network, and the target detection model is obtained through training by the following steps:

inputting sample point clouds of samples in a sample set into an initial model to obtain first characteristic information output by an expansion convolution layer;

Obtaining second characteristic information based on a generated sample corresponding to an input sample in a generated set and a characteristic extraction network of a pre-trained detection model, wherein the generated sample in the generated set has a corresponding relation with a sample in the sample set;

determining a loss function value based on the first feature information and the second feature information;

updating model parameters of the initial model by using the loss function value to obtain a target detection model; the generation set is generated by:

extracting target point clouds of different types of targets from sample point clouds of a sample;

dividing the target point clouds of each category into at least one target point cloud subset according to the orientation angle;

dividing target point clouds in each target point cloud subset into a first type of target point cloud and a second type of target point cloud according to the contained points, wherein the points contained in the first type of target point cloud are more than those contained in the second type of target point cloud;

and obtaining a generation set based on the first type target point cloud and the second type target point cloud in each target point cloud subset.

2. The method of claim 1, wherein the generating the set based on the first type of target point cloud and the second type of target point cloud in each target point cloud subset comprises:

In response to determining that the samples in the sample set contain second-class target point clouds, replacing the second-class target point clouds contained in the samples based on first-class target point clouds in target point cloud subsets to which the second-class target point clouds contained in the samples belong, and obtaining a generated sample;

in response to determining that the samples in the sample set do not contain a second type of target point cloud, the samples are taken as generated samples.

3. The method according to claim 2, wherein the replacing the second class of target point clouds included in the sample based on the first class of target point clouds in the target point cloud subset to which the second class of target point clouds included in the sample belongs to, to obtain the generated sample includes:

calculating the distance between a second type of target point cloud contained in the sample and a first type of target point cloud in the target point cloud subset, and determining a replacement target point cloud from the first type of target point cloud based on a calculation result;

and replacing the second type of target point cloud contained in the sample according to the target point cloud for replacement.

4. A method according to claim 3, wherein replacing the second class of target point cloud contained in the sample according to the replacement target point cloud comprises:

Rotating the target point cloud for replacement according to the direction angle of the second type of target point cloud contained in the sample;

and replacing the second type of target point cloud contained in the sample by using the rotated target point cloud for replacement.

5. The method of claim 1, wherein the feature extraction network of the detection model is the same network structure as the feature extraction network of the initial model.

6. The method of claim 1, wherein the samples in the sample set comprise a sample point cloud and sample target detection results corresponding to the sample point cloud; and

the determining a loss function value based on the first feature information and the second feature information includes:

calculating the distance between the first characteristic information and the second characteristic information, and determining a first loss function value according to the distance;

generating a second loss function value based on target class information output by the initial model aiming at the input sample point cloud and target class information in a sample target detection result corresponding to the input sample;

generating a third loss function value based on bounding box information output by the initial model aiming at the input sample point cloud and bounding box information in a sample target detection result corresponding to the input sample;

And generating a loss function value according to the first loss function value, the second loss function value and the third loss function value.

7. An apparatus for detecting an object, comprising:

the input unit is configured to input the acquired point cloud into a pre-established target detection model to obtain a target detection result, wherein the target detection model comprises an expansion convolution layer for processing characteristic information output by a characteristic extraction network, and the target detection model is obtained through training by the following units:

the first generation unit is configured to input a sample point cloud of a sample in the sample set into the initial model to obtain first characteristic information output by the expansion convolution layer;

a second generating unit configured to obtain second feature information based on a feature extraction network of a generated sample corresponding to an input sample in a generated set and a pre-trained detection model, wherein the generated sample in the generated set has a correspondence with a sample in the sample set;

a determining unit configured to determine a loss function value based on the first feature information and the second feature information;

a parameter updating unit configured to update model parameters of the initial model with the loss function value to obtain a target detection model; the generation set is generated by:

An extraction unit configured to extract target point clouds of different kinds of targets from sample point clouds of a sample;

a first dividing unit configured to divide the target point clouds of the respective categories into at least one target point cloud subset according to the orientation angle;

a second dividing unit configured to divide the target point clouds in each target point cloud subset into a first type of target point cloud and a second type of target point cloud according to the contained points, wherein the first type of target point cloud contains more points than the second type of target point cloud;

the generation set generation unit is configured to obtain a generation set based on a first type target point cloud and a second type target point cloud in each target point cloud subset.

8. The apparatus of claim 7, wherein the generation set generation unit comprises:

a replacing unit configured to replace, in response to determining that the samples in the sample set contain the second type of target point cloud, the second type of target point cloud contained in the samples based on the first type of target point cloud in the target point cloud subset to which the second type of target point cloud contained in the samples belongs, to obtain a generated sample;

and a generation subunit configured to take the sample as a generated sample in response to determining that the samples in the sample set do not contain the second class of target point cloud.

9. The apparatus of claim 8, wherein the replacement unit comprises:

the computing module is configured to compute the distance between a second type of target point cloud contained in the sample and a first type of target point cloud in the belonging target point cloud subset, and determine a replacement target point cloud from the first type of target point cloud based on a computing result;

and the replacing sub-module is configured to replace the second type target point cloud contained in the sample according to the replacing target point cloud.

10. The apparatus of claim 9, wherein the replacement sub-module is further configured to:

11. The apparatus of claim 7, wherein the feature extraction network of the detection model is the same network structure as the feature extraction network of the initial model.

12. The apparatus of claim 7, wherein the samples in the sample set comprise a sample point cloud and sample target detection results corresponding to the sample point cloud; and

The determination unit is further configured to:

13. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-6.

14. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-6.