CN115631296A

CN115631296A - 3D target detection method, computer program product and electronic equipment

Info

Publication number: CN115631296A
Application number: CN202211124920.5A
Authority: CN
Inventors: 李帅霖; 汪天才
Original assignee: Beijing Megvii Technology Co Ltd
Current assignee: Beijing Megvii Technology Co Ltd
Priority date: 2022-09-15
Filing date: 2022-09-15
Publication date: 2023-01-20

Abstract

The application provides a 3D target detection method, a computer program product and an electronic device, wherein the 3D target detection method comprises the following steps: performing feature extraction on an image to be detected corresponding to a target to be detected, which is acquired by an image acquisition device, to obtain corresponding image features; determining internal reference disturbance transformation information according to image characteristics, and correcting original internal reference information of the image acquisition device based on the internal reference disturbance transformation information to obtain corrected internal reference information; and 3D target detection is carried out according to the corrected internal reference information and the image characteristics. In the above scheme, even if the original internal reference information of the image acquisition device deviates due to the problem of camera disturbance, the original internal reference information can be corrected by the 3D target detection method provided by the application, and the corrected internal reference information is utilized to perform 3D target detection, so that the accuracy of 3D target detection can be improved.

Description

3D target detection method, computer program product and electronic equipment

Technical Field

The present application relates to the field of object detection technologies, and in particular, to a 3D object detection method, a computer program product, and an electronic device.

Background

Target detection is a traditional task in the field of computer vision, and unlike image recognition, target detection requires not only identifying an object present on an image and giving a corresponding category, but also giving a position of the object. The target detection includes 2D target detection and 3D target detection, and the 3D target detection refers to a target detection method for outputting information such as object types, length, width, height, rotation angles and the like in a three-dimensional space.

3D object detection can be applied in several fields, for example: the field of automatic driving, the field of driving assistance, traffic flow monitoring, industrial detection, and the like. Among them, in 3D object detection, the problem of camera disturbance is ubiquitous. Taking the fields of automatic driving and assisted driving as an example, camera disturbance generally comes from foreign object collision or from road bumps and the like. In the 3D target detection process, camera disturbance can cause that the internal parameters of a camera in a target detection frame have certain deviation compared with a calibration value, so that the target detection accuracy is influenced.

Disclosure of Invention

An object of the embodiments of the present application is to provide a 3D target detection method, a computer program product, and an electronic device, so as to solve the technical problem in the prior art that the accuracy of target detection is low.

In a first aspect, an embodiment of the present application provides a 3D target detection method, including: performing feature extraction on an image to be detected corresponding to the target to be detected, which is acquired by an image acquisition device, to obtain corresponding image features; determining internal reference disturbance transformation information according to the image characteristics, and correcting the original internal reference information of the image acquisition device based on the internal reference disturbance transformation information to obtain corrected internal reference information; and carrying out 3D target detection according to the corrected internal reference information and the image characteristics. In the above scheme, after the image features are obtained, the original internal reference information of the image acquisition device can be corrected based on the image features, so that the obtained corrected internal reference information is more suitable for a scene when the image to be detected is acquired. Therefore, even if the original internal reference information of the image acquisition device deviates due to the problem of camera disturbance, the original internal reference information can be corrected by the 3D target detection method provided by the application, and the corrected internal reference information is used for carrying out 3D target detection, so that the accuracy of 3D target detection can be improved.

In an optional embodiment, the determining the internal disturbance transformation information according to the image feature includes: carrying out dimensionality reduction processing on the image features to obtain first intermediate features subjected to dimensionality reduction; performing feature extraction on the first intermediate features to obtain second intermediate features; and expanding the second intermediate features and performing regression processing to obtain the internal reference disturbance transformation information. In the scheme, the image features are processed to obtain corresponding internal memory disturbance transformation information, so that the original internal reference information of the image acquisition device can be corrected according to the internal reference disturbance transformation information, the corrected internal reference information is used for 3D target detection, and the accuracy of the 3D target detection can be improved.

In an optional embodiment, the internal reference disturbance transformation information is represented by an internal reference disturbance matrix, the original internal reference information is represented by an original internal reference matrix, and the modified internal reference information is represented by a modified internal reference matrix; the correcting the original internal reference information of the image acquisition device based on the internal reference disturbance transformation information to obtain corrected internal reference information comprises the following steps: and calculating the product of the internal reference disturbance matrix and the original internal reference matrix to obtain the corrected internal reference matrix. In the above scheme, the internal reference disturbance transformation information may be represented by an internal reference disturbance matrix, the original internal reference information may be represented by an original internal reference matrix, and the modified internal reference information may also be represented by a modified internal reference matrix. Therefore, the original internal reference information can be corrected through matrix multiplication, and the accuracy of 3D target detection is improved.

In an optional embodiment, the performing 3D object detection according to the modified internal reference information and the image feature includes: projecting the image features to a 3D space according to the corrected internal reference information to obtain a corresponding 3D position code; the image characteristics of a pixel point in the image to be detected correspond to a 3D position code; and carrying out 3D target detection on the target to be detected according to the 3D position code and the image characteristics. In the above scheme, after the original internal reference information of the image acquisition device is corrected to obtain the corrected internal reference information, the 3D target detection can be performed according to the corrected internal reference information and the image characteristics, so that the accuracy of the 3D target detection can be improved.

In an optional embodiment, the step of extracting the features of the to-be-detected image corresponding to the to-be-detected target acquired by the image acquisition device to obtain the corresponding image features is performed by a feature extraction network in a 3D target detection model; the step of determining internal reference disturbance transformation information according to the image characteristics, and modifying the original internal reference information of the image acquisition device based on the internal reference disturbance transformation information to obtain modified internal reference information is executed through an internal reference modification network in the 3D target detection model; and the step of carrying out 3D target detection according to the corrected internal reference information and the image characteristics is executed through a target detection network in the 3D target detection model. In the foregoing solution, the 3D object detection method provided in the embodiment of the present application may be executed by a 3D object detection model, where the 3D object detection model may include a feature extraction network, an internal reference correction network, and an object detection network. Based on the 3D target detection model, the original internal reference information of the image acquisition device can be corrected, and the corrected internal reference information is utilized to carry out 3D target detection, so that the accuracy of the 3D target detection can be improved.

In an alternative embodiment, the internal reference estimation model comprises: multilayer 1 × 1 convolutional layers. In the scheme, the 1 × 1 convolutional layer in the internal reference estimation model can extract richer features on the basis of not changing the feature dimension of the image; the network depth can be deepened by the multilayer 1 x 1 convolution layers, so that the influence of the correction branch and the detection branch is reduced, and the accuracy of 3D target detection is improved.

In an alternative embodiment, the 3D object detection model is trained by the following process: acquiring a sample image, corresponding real internal parameters and corresponding training annotation results; updating parameters of the feature extraction network, the internal reference correction network and the target detection network according to the sample image, the real internal reference and the training annotation result; and inputting the internal parameters in the target detection network in the training process as the real internal parameters. In the above scheme, the sample image, the corresponding real internal parameters and the corresponding training labeling results are used to train the feature extraction network, the internal parameter correction network and the target detection network in the 3D target detection model, so that the trained 3D target detection model can be used to perform 3D target detection, and a target detection result with high accuracy is obtained. The internal parameters input into the target detection network in the training process can be real internal parameters, so that the training stability is improved.

In an optional embodiment, after the parameters of the feature extraction network, the internal reference correction network, and the target detection network are updated according to the sample image, the real internal reference, and the labeling result, the trained 3D target detection model is tested by the following processes: acquiring a test image and a corresponding test labeling result; testing the trained feature extraction network, the trained internal reference correction network and the trained target detection network according to the test image and the test labeling result; and inputting the internal parameters of the trained 3D target detection network in the test process as the predicted internal parameters output by the trained internal parameter correction network. In the above scheme, after the training of the 3D target detection model is completed, the trained 3D target detection model can be further tested to test whether the 3D target training model meets the requirement of 3D target detection, so that the accuracy of 3D target detection is ensured.

In a second aspect, embodiments of the present application provide a computer program product, which includes computer program instructions, when read and executed by a processor, for performing the 3D object detection method according to the first aspect.

In a third aspect, an embodiment of the present application provides an electronic device, including: a processor, a memory, and a bus; the processor and the memory are communicated with each other through the bus; the memory stores computer program instructions executable by the processor, the processor invoking the computer program instructions capable of performing the 3D object detection method according to the first aspect.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium storing computer program instructions, which, when executed by a computer, cause the computer to execute the 3D object detection method according to the first aspect.

In order to make the aforementioned objects, features and advantages of the present application more comprehensible, embodiments accompanied with figures are described in detail below.

Drawings

To more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.

Fig. 1 is a flowchart of a 3D target detection method according to an embodiment of the present disclosure;

fig. 2 is a schematic diagram of a 3D object detection model provided in an embodiment of the present application;

fig. 3 is a block diagram of an internal reference correction network according to an embodiment of the present disclosure;

fig. 4 is a block diagram of a 3D object detection apparatus according to an embodiment of the present disclosure;

fig. 5 is a block diagram of an electronic device according to an embodiment of the present disclosure.

Detailed Description

In recent years, technical research based on artificial intelligence, such as computer vision, deep learning, machine learning, image processing, and image recognition, has been actively developed. Artificial Intelligence (AI) is an emerging scientific technology for studying and developing theories, methods, techniques and application systems for simulating and extending human Intelligence. The artificial intelligence subject is a comprehensive subject and relates to various technical categories such as chips, big data, cloud computing, internet of things, distributed storage, deep learning, machine learning and neural networks. Computer vision is an important branch of artificial intelligence, particularly a machine is used for identifying the world, and computer vision technologies generally comprise technologies such as face identification, living body detection, fingerprint identification and anti-counterfeiting verification, biological feature identification, face detection, pedestrian detection, target detection, pedestrian identification, image processing, image identification, image semantic understanding, image retrieval, character identification, video processing, video content identification, three-dimensional reconstruction, virtual reality, augmented reality, synchronous positioning and map construction (SLAM), computational photography, robot navigation and positioning and the like. With the research and development of artificial intelligence technology, the technology is applied to many fields, such as security protection, city management, traffic management, building management, park management, face passage, face attendance, logistics management, warehouse management, robots, intelligent marketing, computational photography, mobile phone images, cloud services, smart homes, wearable equipment, unmanned driving, automatic driving, intelligent medical treatment, face payment, face unlocking, fingerprint unlocking, human evidence verification, smart screens, smart televisions, cameras, mobile internet, network, beauty, makeup, medical beauty, intelligent temperature measurement and the like.

The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.

Referring to fig. 1, fig. 1 is a flowchart of a 3D object detection method according to an embodiment of the present disclosure, where the 3D object detection method may be, but is not limited to be, executed by the electronic device shown in fig. 5. For example, the 3D target detection method provided by the embodiment of the present application may be executed by an image acquisition device, a vehicle-mounted controller, and the like; or, the 3D target detection method provided by the embodiment of the present application may also be executed by a cloud server, and the like.

In the embodiment of the present application, the 3D object detection method in fig. 1 may be performed by using a 3D object detection model. Referring to fig. 2, fig. 2 is a schematic diagram of a 3D object detection model according to an embodiment of the present disclosure. The 3D target detection model comprises a feature extraction network, an internal reference correction network and a target detection network, wherein the feature extraction network is connected with the internal reference correction network and the target detection network, and the internal reference correction network is connected with the target detection network.

The feature extraction network is used for extracting image features based on an image to be detected corresponding to a target to be detected, the image acquisition device acquires the image features, the internal reference correction network is used for correcting original internal reference information of the image acquisition device based on the image features to obtain corrected internal reference information, and the target detection network is used for carrying out 3D target detection based on the corrected internal reference information and the image features.

The embodiment of the application does not specifically limit the specific structures of the feature extraction network and the target detection network, and a person skilled in the art can appropriately select the feature extraction network and the target detection network according to actual conditions. For example, the feature extraction network may be implemented by using networks such as ResNet50, resNet101, vovNet, and the like; and the target detection network can be implemented by adopting a Position embedding transformation for multi-view 3d object detection (PETR) frame and the like for multi-view three-dimensional target detection.

The following describes a specific structure of an internal reference correction network provided in an embodiment of the present application in detail.

As an embodiment, the internal reference correction network may include: multilayer 1 × 1 convolutional layers. Therefore, the internal reference correction network can extract richer features on the basis of not changing the feature dimension of the image; in addition, the network depth can be increased by the multilayer 1 × 1 convolutional layer, so that the influence of the correction branch (i.e., the branch corresponding to the internal reference correction network) and the detection branch (i.e., the branch corresponding to the target detection network) is reduced, and the accuracy of 3D target detection is improved.

As another embodiment, please refer to fig. 3, fig. 3 is a block diagram of a structure of an internal reference correction network provided in this application embodiment, where the internal reference correction network may include the following structures connected in sequence: max Pooling layer (Max Pooling), target detection specific layer (ROI Align), multilayer 1 × 1 convolutional layer (Conv 1 × 1), multilayer fully-connected layer (FC).

The following describes the 3D object detection method provided by the embodiment in detail, taking an example of applying the 3D object detection method provided by the embodiment in the field of automatic driving. Referring to fig. 1, a 3D target detection method provided in an embodiment of the present application may include the following steps:

step S101: and extracting the characteristics of the image to be detected corresponding to the target to be detected acquired by the image acquisition device to obtain the corresponding image characteristics.

Step S102: and determining internal reference disturbance transformation information according to the image characteristics, and correcting the original internal reference information of the image acquisition device based on the internal reference disturbance transformation information to obtain corrected internal reference information.

Step S103: and 3D target detection is carried out according to the corrected internal reference information and the image characteristics.

Specifically, the image pickup device provided on the vehicle may pick up images of the surroundings at intervals. Wherein, the period of time can have various embodiments, for example: 5 seconds, 10 seconds, etc.

It is to be understood that the embodiments of the present application are not limited to the specific implementation of the image capturing device. For example, the image capturing device may be a camera or a camera; for another example, a plurality of image capturing devices may be disposed on the vehicle, and the plurality of image capturing devices may be disposed at different positions of the vehicle, so that the plurality of image capturing devices capture images at a plurality of viewing angles and surround a perception scene forming a 360-degree panorama.

In the process of vehicle running, the image acquisition device acquires a real-time image to be detected corresponding to the target to be detected. When the vehicle runs smoothly, the internal reference information of the image acquisition device is not influenced or is slightly influenced, and the internal reference information does not have large deviation with the original internal reference information built in the image acquisition device, so that 3D target detection can be directly carried out on the basis of the original internal reference information; when the vehicle runs in a bumpy state, the internal reference information of the image acquisition device is affected, and the internal reference information has a large deviation with the original internal reference information built in the image acquisition device, so that if 3D target detection is still performed based on the original internal reference information, the accuracy of a detection result is low.

Therefore, as an implementation manner, the 3D object detection method provided by the embodiment of the present application may be executed only when the vehicle bumps, for example: the electronic equipment can judge that the vehicle bumps in a certain time period, and the 3D target detection method can be adopted for the image to be detected acquired in the time period.

It is understood that in this embodiment, the electronic device may determine whether the vehicle has a bump based on the information collected by the other sensors, for example, may determine whether the vehicle has a bump based on the track of the vehicle; or whether the vehicle bumps or not can be judged based on the running speed of the vehicle, and the like, and the embodiment of the application is not particularly limited.

As another implementation manner, the 3D object detection method provided in the embodiment of the present application may also be performed during the whole vehicle traveling process, that is, the above 3D object detection method is adopted for all the acquired images to be detected.

It can be understood that, in this embodiment, for an image to be detected that is acquired when no bump occurs, the corrected internal reference information obtained after correction may be the same as the original internal reference information before correction; for the image to be detected collected during bumping, the corrected internal reference information obtained after correction may be different from the original internal reference information before correction.

Before the above step S101 is performed, an image to be detected may be acquired. Wherein, the mode of obtaining the image to be detected has a plurality ofly, for example: the image to be detected is collected, the image to be detected sent by other equipment is received, and the like, and the technicians in the field can select the images appropriately according to actual conditions.

In the step S101, feature extraction may be performed on the image to be detected to obtain corresponding image features. It can be understood that, the embodiment of the present application does not specifically limit the specific implementation manner of extracting the image features of the image to be detected, and those skilled in the art can make appropriate adjustments according to actual situations. For example, the extraction of the image features may be performed by using a Histogram of Oriented Gradient (HOG) feature extraction algorithm; alternatively, extraction of image features and the like may be performed using a neural network.

As an implementation, the image features may be extracted by using a feature extraction network in the 3D object detection model in the above embodiment, that is, the above step S101 may be performed by using a feature extraction network in the 3D object detection model. That is to say, the image to be detected may be input into the feature extraction network, and the feature extraction network performs feature extraction on the image to be detected and outputs corresponding image features.

It can be understood that, when the number of the images to be detected input into the feature extraction network is multiple, the number dimension corresponding to the image features output by the feature extraction network is greater than 1, and one number dimension corresponds to one image to be detected. For example, when the number of the images to be detected in the input feature extraction network is N, the feature dimension of the feature extraction network is C, and the length and width of the images are H respectively _F And W _F The dimension of the image features output by the feature extraction network is NxCxH _F ×W _F 。

In step S102, the intra-reference perturbation transformation information may be determined based on the image features obtained in step S101. It is to be understood that, in the embodiments of the present application, a specific implementation manner for determining the internal reference disturbance transformation information is not specifically limited, and those skilled in the art may make appropriate adjustments according to actual situations. For example, the image processing algorithm may be used to determine the internal reference perturbation transformation information; alternatively, the neural network may be used to determine the internal disturbance transformation information, and the like.

In addition, in step S102, the original internal reference information of the image capturing device may be modified according to the determined internal reference disturbance transformation information, so as to obtain modified internal reference information. It will be appreciated that the manner of modification may vary depending on the form of the determined intra-parametric perturbation transformation information. For example, the internal reference disturbance transformation information may be added to the original internal reference information to obtain modified internal reference information; or, the modified internal reference information may be obtained by multiplying the internal reference disturbance transformation information by the original internal reference information.

As an implementation manner, the internal reference correction network in the 3D object detection model in the above embodiment may be used to determine the internal reference disturbance transformation information and correct the original internal reference information, that is, the above step S102 may be performed by the internal reference correction network in the 3D object detection model. That is, the image features may be input into an internal reference correction network, the internal reference correction network determines internal reference disturbance information according to the image features, corrects the original internal reference information of the image acquisition device based on the internal reference disturbance information, and outputs corresponding corrected internal reference information.

In step S103, 3D object detection is performed based on the corrected internal reference information obtained in step S102 and the image feature obtained in step S101. It is to be understood that the embodiment of the present application is not particularly limited to the specific implementation of 3D object detection, and those skilled in the art may make appropriate adjustments according to actual situations. For example, 3D target detection may be achieved by generating a 3D position code; alternatively, a graph neural network (Point-GNN) may be employed to implement 3D object detection, etc.

As an implementation, the 3D object detection may be performed by using an object detection network in the 3D object detection model in the above embodiment, that is, the above step S103 may be performed by an object detection network in the 3D object detection model. That is to say, the corrected internal reference information and the image features may be input into the target detection network, and the target detection network performs 3D target detection based on the corrected internal reference and the image features to obtain a corresponding 3D target detection result.

It should be noted that, in the above embodiment, the feature extraction network, the internal reference correction network, and the target detection network are all networks that have been trained in advance, and a specific training manner thereof will be described in the following embodiments, which will not be described here.

In the above scheme, after the image features are obtained, the original internal reference information of the image acquisition device can be corrected based on the image features, so that the obtained corrected internal reference information is more suitable for a scene when the image to be detected is acquired. Therefore, even if the original internal reference information of the image acquisition device is deviated due to the problem of camera disturbance, the original internal reference information can be corrected by the 3D target detection method provided by the application, and the corrected internal reference information is utilized to carry out 3D target detection, so that the accuracy of 3D target detection can be improved.

Further, on the basis of the foregoing embodiment, the step of determining the internal reference disturbance transformation information according to the image feature in step S102 may specifically include the following steps:

step 1), carrying out dimensionality reduction processing on the image features to obtain first intermediate features after dimensionality reduction.

And 2) carrying out feature extraction on the first intermediate features to obtain second intermediate features.

And 3) unfolding the second intermediate features and performing regression processing to obtain the internal reference disturbance transformation information.

Specifically, taking the example of performing the step S102 through the internal reference correction network shown in fig. 3, N × C × H may be obtained _F ×W _F Inputting the image characteristics of dimensionality into an internal reference correction network, and performing dimensionality reduction processing on the image characteristics through two layers of Pooling operations of a maximum Pooling layer (Max Pooling) and a target detection special layer (ROI Align) to obtain first intermediate characteristics with the dimensionality of NxCx7 x 7; then, the first intermediate feature may be subjected to feature extraction by a multilayer 1 × 1 convolutional layer (Conv 1 × 1) to obtain a second intermediate feature; finally, the second intermediate features are expanded and subjected to regression processing through a plurality of full connection layers (FC), and the internal reference disturbance transformation can be obtained.

In the scheme, the image features are processed to obtain corresponding internal memory disturbance transformation information, so that the original internal reference information of the image acquisition device can be corrected according to the internal reference disturbance transformation information, the corrected internal reference information is used for 3D target detection, and the accuracy of the 3D target detection can be improved.

Further, on the basis of the above embodiment, the internal reference disturbance transformation information may be characterized by an internal reference disturbance matrix, the original internal reference information may be characterized by an original internal reference matrix, and the modified internal reference information may be characterized by a modified internal reference matrix. At this time, the step of modifying the original internal reference information of the image acquisition device based on the internal reference disturbance transformation information in step S102 to obtain modified internal reference information may specifically include the following steps:

and calculating the product of the internal reference disturbance matrix and the original internal reference matrix to obtain a corrected internal reference matrix.

In the above scheme, the internal reference disturbance transformation information may be represented by an internal reference disturbance matrix, the original internal reference information may be represented by an original internal reference matrix, and the modified internal reference information may also be represented by a modified internal reference matrix. Therefore, the original internal reference information can be corrected through matrix multiplication, and the accuracy of 3D target detection is improved.

Further, on the basis of the above embodiments, a specific implementation of 3D object detection by generating a 3D position code is described below. In this case, the step S103 may specifically include the following steps:

and step 1), projecting the image characteristics to a 3D space according to the corrected internal reference information to obtain a corresponding 3D position code.

And 2) carrying out 3D target detection according to the 3D position code and the image characteristics.

Specifically, in step 1), the 3D position code may be generated by PETR: and projecting each point in the image characteristics to a 3D space according to the corrected internal reference information and by using the parameters of the image acquisition device to obtain a 3D position code corresponding to the image characteristics of each pixel point in the image to be detected.

Then, in the step 2), the image feature is combined with the 3D position code, and a plurality of 3D detection attributes are predicted as input of the detection head through a transform structure and a multi-layer Feed-Forward Network (FFN), thereby realizing 3D object detection.

In the above scheme, after the original internal reference information of the image acquisition device is corrected to obtain the corrected internal reference information, 3D target detection can be performed according to the corrected internal reference information and the image characteristics, so that the accuracy of 3D target detection can be improved.

Further, on the basis of the above embodiment, before the 3D object detection method is executed by using the 3D object detection model, an initial 3D object detection model may be trained to obtain a trained 3D object detection model. The feature extraction network, the internal reference correction network and the target detection network in the 3D target detection model are coupled with each other, so that the three networks are trained cooperatively in the process of training the 3D target detection model.

Two training modes of the 3D object detection model provided in the embodiments of the present application are described below by way of example.

First, the 3D object detection model can be trained by the following process:

step 1), obtaining a sample image and a corresponding training annotation result.

Step 2), updating parameters of the feature extraction network, the internal reference correction network and the target detection network according to the sample image and the training annotation result; and the internal parameters input into the 3D target detection network in the training process are prediction internal parameters output by the internal parameter correction network.

Specifically, before the step 2) is executed, the sample image and the training annotation result corresponding to the sample image may be obtained. There are various ways to obtain the sample image, for example: taking the collected image as a sample image; or, a sample image or the like sent by other devices or stored in the cloud is received, and those skilled in the art can make a suitable selection according to actual situations. Similarly, there are various ways to obtain the training annotation result corresponding to the sample image, for example: labeling the sample image; or, receiving the annotation data and the like sent by other devices, and those skilled in the art can also make a suitable selection according to the actual situation.

In the step 2), the sample image may be input to a feature extraction network in the 3D target detection model, and corresponding image features are output; inputting the image characteristics into an internal reference correction network in the 3D target detection model, and outputting corresponding prediction internal references (the prediction internal references are corrected internal references); inputting the predicted internal parameters and the image characteristics into a target detection network in the 3D target detection model, and outputting a corresponding predicted detection result; and finally, updating parameters of a feature extraction network, an internal reference correction network and a target detection network in the 3D target detection model based on the prediction detection result and the training annotation result corresponding to the sample image.

It can be appreciated that, in one embodiment, if the number of convolutional layers in the internal reference correction network is greater, the depth of the convolutional layers is deeper, so that the parameters of the feature extraction network are updated less in the training process, and the influence on the 3D target detection network in the training process can be reduced.

In the above scheme, the sample image and the corresponding training labeling result are utilized to train the feature extraction network, the internal reference correction network and the target detection network in the 3D target detection model, so that the trained 3D target detection model can be utilized to perform 3D target detection, and a 3D target detection result with high accuracy is obtained.

Second, the target detection model may be trained by:

and step 1), obtaining a sample image, corresponding real internal reference and a corresponding training annotation result.

Step 2), updating parameters of the feature extraction network, the internal reference correction network and the target detection network according to the sample image, the real internal reference and the training annotation result; and the internal parameters input into the target detection network in the training process are real internal parameters.

Specifically, similar to the above embodiment, before the step 2) is executed, the sample image and the training annotation result corresponding to the sample image may be obtained. There are various ways to obtain the sample image, for example: taking the collected image as a sample image; or, a sample image or the like sent by other devices or stored in a cloud is received, and those skilled in the art can make a suitable selection according to actual situations. Similarly, there are various ways to obtain the training annotation result corresponding to the sample image, for example: labeling the sample image; or receiving the annotation data and the like sent by other devices, and those skilled in the art can also make an appropriate selection according to the actual situation.

Unlike the foregoing embodiment, in the foregoing step 1), a true internal reference corresponding to the sample image may also be obtained (the true internal reference is a modified internal reference), and the embodiment of the present application is not limited to the specific implementation of obtaining the true internal reference.

In the step 2), the sample image may be input to a feature extraction network in the 3D target detection model, and corresponding image features are output; inputting the image characteristics into an internal reference correction network in the 3D target detection model, and outputting corresponding prediction internal references; inputting the real internal reference and the image characteristics into a target detection network in the 3D target detection model, and outputting a corresponding prediction detection result; and finally, updating parameters of a feature extraction network, an internal reference correction network and a target detection network in the 3D target detection model based on the prediction detection result and the training annotation result corresponding to the sample image.

It can be understood that, in the embodiment of the present application, in the training process, since the predicted internal reference output by the internal reference correction network may not reach the effect of the real internal reference initially, the real internal reference may be input into the subsequent 3D target detection network, thereby improving the stability of the training.

In addition, in one embodiment, if the number of convolutional layers in the internal reference correction network is large, the depth of the convolutional layers is deep, so that the parameters of the feature extraction network are updated less in the training process, and the influence on the 3D target detection network in the training process can be reduced.

In the above scheme, the sample image, the corresponding real internal parameters and the corresponding training labeling results are used to train the feature extraction network, the internal parameter correction network and the target detection network in the 3D target detection model, so that the trained 3D target detection model can be used to perform 3D target detection, and a target detection result with high accuracy is obtained.

Further, on the basis of the above embodiment, after the 3D object detection model is trained, the trained 3D object detection model may also be tested. Wherein, for the two training modes, the same testing mode can be adopted.

A testing method of a 3D object detection model provided in the embodiments of the present application is described below by way of example. The trained 3D target detection model can be tested by the following processes:

step 1), obtaining a test image and a corresponding test labeling result.

Step 2), testing the trained feature extraction network, the trained internal reference correction network and the trained target detection network according to the test image and the test labeling result; and the internal parameters of the trained target detection network input in the test process are the predicted internal parameters output by the trained internal parameter correction network.

Specifically, similar to the above embodiment, before the step 2) is executed, the test image and the training annotation result corresponding to the test image may be obtained. Among them, there are various ways to obtain a test image, for example: using the collected image as a test image; or, a test image or the like sent by other devices or stored in a cloud is received, and those skilled in the art can make a suitable selection according to actual conditions. Similarly, there are various ways to obtain the training annotation result corresponding to the test image, for example: labeling the test image; or, receiving the annotation data and the like sent by other devices, and those skilled in the art can also make a suitable selection according to the actual situation.

In the step 2), the test image may be input to a feature extraction network in the trained 3D target detection model, and corresponding image features are output; inputting the image characteristics into an internal reference correction network in the trained 3D target detection model, and outputting corresponding prediction internal references; inputting the predicted internal parameters and the image characteristics into a target detection network in the trained 3D target detection model, and outputting a corresponding predicted detection result; and finally, detecting the performance of the trained 3D target detection model based on the prediction detection result and the test labeling result corresponding to the test image.

In the above scheme, after the training of the 3D target detection model is completed, the trained 3D target detection model can be further tested to test whether the 3D target training model meets the requirement of 3D target detection, so that the accuracy of 3D target detection is ensured.

Referring to fig. 4, fig. 4 is a block diagram of a 3D object detection apparatus according to an embodiment of the present disclosure, where the 3D object detection apparatus 400 includes: the feature extraction module 401 is configured to perform feature extraction on an image to be detected corresponding to the target to be detected, which is acquired by the image acquisition device, to obtain corresponding image features; a correction module 402, configured to determine internal reference disturbance transformation information according to the image feature, and correct the original internal reference information of the image acquisition device based on the internal reference disturbance transformation information to obtain corrected internal reference information; and an object detection module 403, configured to perform 3D object detection according to the modified internal reference information and the image feature.

In the embodiment of the application, after the image features are obtained, the original internal reference information of the image acquisition device can be corrected based on the image features, so that the obtained corrected internal reference information is more suitable for a scene when the image to be detected is acquired. Therefore, even if the original internal reference information of the image acquisition device deviates due to the problem of camera disturbance, the original internal reference information can be corrected by the 3D target detection device 400 provided by the application, and 3D target detection is performed by using the corrected internal reference information, so that the accuracy of 3D target detection can be improved.

Further, the modification module 402 is specifically configured to: performing dimensionality reduction processing on the image features to obtain first intermediate features after dimensionality reduction; performing feature extraction on the first intermediate features to obtain second intermediate features; and expanding the second intermediate features and performing regression processing to obtain the internal reference disturbance transformation information.

In the embodiment of the application, the image features are processed to obtain the corresponding internal memory disturbance transformation information, so that the original internal reference information of the image acquisition device can be corrected according to the internal reference disturbance transformation information, the corrected internal reference information is utilized to perform 3D target detection, and the accuracy of the 3D target detection can be improved.

Further, the internal reference disturbance transformation information is represented by an internal reference disturbance matrix, the original internal reference information is represented by an original internal reference matrix, and the modified internal reference information is represented by a modified internal reference matrix; the modification module 402 is specifically configured to: and calculating the product of the internal parameter disturbance matrix and the original internal parameter matrix to obtain the corrected internal parameter matrix.

In the embodiment of the application, the internal reference disturbance transformation information can be represented by an internal reference disturbance matrix, the original internal reference information can be represented by an original internal reference matrix, and the corrected internal reference information can also be represented by a corrected internal reference matrix. Therefore, the original internal reference information can be corrected through matrix multiplication, and the accuracy of 3D target detection is improved.

Further, the target detection module 403 is specifically configured to: projecting the image features to a 3D space according to the corrected internal reference information to obtain a corresponding 3D position code; the image characteristics of a pixel point in the image to be detected correspond to a 3D position code; and carrying out 3D target detection according to the 3D position code and the image characteristics.

In the embodiment of the application, after the original internal reference information of the image acquisition device is corrected to obtain the corrected internal reference information, 3D target detection can be performed according to the corrected internal reference information and the image characteristics, so that the accuracy of 3D target detection can be improved.

Further, the step of extracting the characteristics of the image to be detected corresponding to the target to be detected, which is acquired by the image acquisition device, to obtain the corresponding image characteristics is executed through a characteristic extraction network in the 3D target detection model; the step of determining internal reference disturbance transformation information according to the image characteristics, and modifying the original internal reference information of the image acquisition device based on the internal reference disturbance transformation information to obtain modified internal reference information is executed through an internal reference modification network in the 3D target detection model; and the step of carrying out 3D target detection according to the corrected internal reference information and the image characteristics is executed through a target detection network in the 3D target detection model.

In this embodiment of the present application, the 3D object detection method provided in this embodiment of the present application may be executed by a 3D object detection model, where the 3D object detection model may include a feature extraction network, an internal reference correction network, and an object detection network. Based on the 3D target detection model, the original internal reference information of the image acquisition device can be corrected, and 3D target detection is performed by using the corrected internal reference information, so that the accuracy of 3D target detection can be improved.

Further, the internal reference estimation model comprises: multilayer 1 × 1 convolutional layers.

In the embodiment of the application, the 1 × 1 convolutional layer in the internal reference estimation model can extract richer features on the basis of not changing the feature dimension of the image; the network depth can be deepened by the multilayer 1 x 1 convolution layers, so that the influence of the correction branch and the detection branch is reduced, and the accuracy of 3D target detection is improved.

Further, the 3D object detection apparatus 400 further includes a training module, and the training module is configured to train the 3D object detection model through the following processes: acquiring a sample image, corresponding real internal parameters and corresponding training annotation results; updating parameters of the feature extraction network, the internal reference correction network and the target detection network according to the sample image, the real internal reference and the training annotation result; and inputting the internal parameters in the target detection network in the training process as the real internal parameters.

In the embodiment of the application, the sample image, the corresponding real internal reference and the corresponding training labeling result are utilized to train the feature extraction network, the internal reference correction network and the target detection network in the 3D target detection model, so that the trained 3D target detection model can be utilized to carry out 3D target detection, and a target detection result with higher accuracy is obtained. The internal parameters input into the target detection network in the training process can be real internal parameters, so that the training stability is improved.

Further, the 3D target detection apparatus 400 further includes a testing module, and the testing module is configured to test the trained 3D target detection model through the following processes: acquiring a test image and a corresponding test labeling result; testing the trained feature extraction network, the trained internal reference correction network and the trained target detection network according to the test image and the test labeling result; and the internal parameters of the trained target detection network input in the test process are the predicted internal parameters output by the trained internal parameter correction network.

In the embodiment of the application, after the training of the 3D target detection model is completed, the trained 3D target detection model can be further tested to test whether the 3D target training model meets the requirement of 3D target detection or not, so that the accuracy of 3D target detection is ensured.

Referring to fig. 5, fig. 5 is a block diagram of an electronic device according to an embodiment of the present disclosure, where the electronic device 500 includes: at least one processor 501, at least one communication interface 502, at least one memory 503, and at least one communication bus 504. Wherein the communication bus 504 is used for implementing direct connection communication of these components, the communication interface 502 is used for communicating signaling or data with other node devices, and the memory 503 stores machine readable instructions executable by the processor 501. When the electronic device 500 is in operation, the processor 501 communicates with the memory 503 via the communication bus 504, and the machine-readable instructions, when called by the processor 501, perform the 3D object detection method described above.

For example, the processor 501 of the embodiment of the present application may read the computer program from the memory 503 through the communication bus 504 and execute the computer program to implement the following method: step S101: and performing feature extraction on the to-be-detected image corresponding to the to-be-detected target acquired by the image acquisition device to obtain corresponding image features. Step S102: and determining internal reference disturbance transformation information according to the image characteristics, and correcting the original internal reference information of the image acquisition device based on the internal reference disturbance transformation information to obtain corrected internal reference information. Step S103: and carrying out 3D target detection according to the corrected internal reference information and the image characteristics.

The processor 501 includes one or more chips, which may be integrated circuit chips, having signal processing capability. The Processor 501 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Micro Control Unit (MCU), a Network Processor (NP), or other conventional processors; the Processor may also be a dedicated Processor, including a Neural-Network Processing Unit (NPU), a Graphics Processing Unit (GPU), a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, and a discrete hardware component. Also, when there are a plurality of processors 501, some of them may be general-purpose processors, and another part may be special-purpose processors.

The Memory 503 includes one or more, which may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Programmable Read-Only Memory (EPROM), an electrically Erasable Programmable Read-Only Memory (EEPROM), and the like.

It will be appreciated that the configuration shown in FIG. 5 is merely illustrative and that electronic device 500 may include more or fewer components than shown in FIG. 5 or have a different configuration than shown in FIG. 5. The components shown in fig. 5 may be implemented in hardware, software, or a combination thereof. In this embodiment, the electronic device 500 may be, but is not limited to, an entity device such as a desktop computer, a notebook computer, a smart phone, an intelligent wearable device, and a vehicle-mounted device, and may also be a virtual device such as a virtual machine. In addition, the electronic device 500 is not necessarily a single device, but may also be a combination of multiple devices, such as a server cluster, and the like.

Embodiments of the present application further provide a computer program product, including a computer program stored on a computer-readable storage medium, where the computer program includes computer program instructions, and when the computer program instructions are executed by a computer, the computer can execute the steps of the 3D object detection method in the foregoing embodiments, for example, including: performing feature extraction on an image to be detected corresponding to a target to be detected, which is acquired by an image acquisition device, to obtain corresponding image features; determining internal reference disturbance transformation information according to the image characteristics, and correcting the original internal reference information of the image acquisition device based on the internal reference disturbance transformation information to obtain corrected internal reference information; and carrying out 3D target detection according to the corrected internal reference information and the image characteristics.

The embodiment of the present application further provides a computer-readable storage medium, where the computer-readable storage medium stores computer program instructions, and when the computer program instructions are executed by a computer, the computer is caused to execute the 3D object detection method described in the foregoing method embodiment.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described apparatus embodiments are merely illustrative, and for example, the division of the units into only one type of logical function may be implemented in other ways, and for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed coupling or direct coupling or communication connection between each other may be through some communication interfaces, indirect coupling or communication connection between devices or units, and may be in an electrical, mechanical or other form.

In addition, units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

Furthermore, the functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.

It should be noted that the functions, if implemented in the form of software functional modules and sold or used as independent products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.

The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A3D object detection method, comprising:

performing feature extraction on an image to be detected corresponding to a target to be detected, which is acquired by an image acquisition device, to obtain corresponding image features;

determining internal reference disturbance transformation information according to the image characteristics, and correcting the original internal reference information of the image acquisition device based on the internal reference disturbance transformation information to obtain corrected internal reference information;

and carrying out 3D target detection according to the corrected internal reference information and the image characteristics.

2. The 3D object detection method according to claim 1, wherein the determining the internal reference disturbance transformation information according to the image feature comprises:

carrying out dimensionality reduction processing on the image features to obtain first intermediate features subjected to dimensionality reduction;

performing feature extraction on the first intermediate features to obtain second intermediate features;

and expanding the second intermediate features and performing regression processing to obtain the internal reference disturbance transformation information.

3. The 3D object detection method according to claim 1 or 2, wherein the internal reference disturbance transformation information is characterized by an internal reference disturbance matrix, the original internal reference information is characterized by an original internal reference matrix, and the modified internal reference information is characterized by a modified internal reference matrix;

the correcting the original internal reference information of the image acquisition device based on the internal reference disturbance transformation information to obtain corrected internal reference information comprises the following steps:

and calculating the product of the internal parameter disturbance matrix and the original internal parameter matrix to obtain the corrected internal parameter matrix.

4. The 3D object detection method according to any one of claims 1 to 3, wherein the performing 3D object detection based on the modified internal reference information and the image feature comprises:

projecting the image features into a 3D space according to the corrected internal reference information to obtain a corresponding 3D position code; the image characteristics of a pixel point in the image to be detected correspond to a 3D position code;

and carrying out 3D target detection on the target to be detected according to the 3D position code and the image characteristics.

5. The 3D target detection method according to any one of claims 1-4, wherein the step of extracting the features of the image to be detected corresponding to the target to be detected, which is acquired by the image acquisition device, to obtain the corresponding image features is performed by a feature extraction network in the 3D target detection model;

the step of determining internal reference disturbance transformation information according to the image characteristics, and modifying the original internal reference information of the image acquisition device based on the internal reference disturbance transformation information to obtain modified internal reference information is executed through an internal reference modification network in the 3D target detection model;

and the step of carrying out 3D target detection according to the corrected internal reference information and the image characteristics is executed through a target detection network in the 3D target detection model.

6. The 3D object detection method of claim 5, wherein the internal reference correction network comprises multiple layers of 1 x 1 convolutional layers.

7. The 3D object detection method according to claim 5 or 6, characterized in that the 3D object detection model is trained by the following process:

acquiring a sample image, corresponding real internal parameters and corresponding training annotation results;

updating parameters of the feature extraction network, the internal reference correction network and the target detection network according to the sample image, the real internal reference and the training annotation result; and inputting the internal parameters in the target detection network in the training process as the real internal parameters.

8. The 3D object detection method according to claim 7, wherein after the parameters of the feature extraction network, the internal reference correction network, and the object detection network are updated according to the sample image, the real internal reference, and the labeling result, the trained 3D object detection model is tested by the following processes:

acquiring a test image and a corresponding test labeling result;

testing the trained feature extraction network, the trained internal reference correction network and the trained target detection network according to the test image and the test labeling result; and inputting the internal parameters of the trained target detection network in the test process as the predicted internal parameters output by the trained internal parameter correction network.

9. A computer program product comprising computer program instructions which, when read and executed by a processor, perform the method of any one of claims 1 to 8.

10. An electronic device, comprising: a processor, memory, and a bus;

the processor and the memory are communicated with each other through the bus;

the memory stores computer program instructions executable by the processor, the processor invoking the computer program instructions to perform the method of any of claims 1-8.

11. A computer-readable storage medium, storing computer program instructions which, when executed by a computer, cause the computer to perform the method of any one of claims 1-8.