WO2023201723A1 - Object detection model training method, and object detection method and apparatus - Google Patents

Object detection model training method, and object detection method and apparatus Download PDF

Info

Publication number
WO2023201723A1
WO2023201723A1 PCT/CN2022/088566 CN2022088566W WO2023201723A1 WO 2023201723 A1 WO2023201723 A1 WO 2023201723A1 CN 2022088566 W CN2022088566 W CN 2022088566W WO 2023201723 A1 WO2023201723 A1 WO 2023201723A1
Authority
WO
WIPO (PCT)
Prior art keywords
camera
training
internal parameters
parameters
image
Prior art date
Application number
PCT/CN2022/088566
Other languages
French (fr)
Chinese (zh)
Inventor
毕舒展
张洁
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to PCT/CN2022/088566 priority Critical patent/WO2023201723A1/en
Priority to CN202280005788.8A priority patent/CN117280385A/en
Publication of WO2023201723A1 publication Critical patent/WO2023201723A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration

Definitions

  • the present application relates to the field of computer vision technology, and in particular to a training method for a target detection model, a target detection method and a device.
  • Target detection is a traditional task in the field of computer vision. Different from image recognition, target detection requires the position of the target object to be given in the form of a minimum bounding box (Bounding box). In 3D (three dimensional, three-dimensional) target detection, the 3D bounding box of the target object needs to be given. Taking the field of autonomous driving as an example, 3D target detection obtains the 3D coordinates of the target object, and then obtains a 3D frame based on the 3D coordinates. The 3D frame is then visualized on the image and aerial view as shown in Figure 1.
  • 3D target detection obtains the 3D coordinates of the target object, and then obtains a 3D frame based on the 3D coordinates. The 3D frame is then visualized on the image and aerial view as shown in Figure 1.
  • Some 3D target detection methods based on monocular cameras have been proposed in related technologies.
  • This method uses a target detection model to process images collected by a monocular camera, and can obtain the 3D vertices of the target object, and then obtain the 3D frame.
  • this method cannot perform data amplification through image-based geometric transformation, so the generalization ability of the model is poor.
  • This application provides a training method for a target detection model, a target detection method and a device, which are used to improve the generalization ability of a 3D target detection model based on a monocular camera.
  • this application provides a training method for a target detection model, including:
  • the target detection model is trained N times, the first camera is any one of the at least one monocular camera, and the N is greater than 1 integer;
  • the N times of training include at least one first training, and the first training includes the following steps:
  • the three-dimensional position information is used as the annotated position information of the target object in the sample image
  • parameters of the target detection model are adjusted.
  • the internal parameters of the monocular camera are transformed into the internal parameters of another monocular camera (i.e., the extended internal parameters), and the image collected by the monocular camera is mapped to the image collected by the extended camera. , and then train the target detection model.
  • the intrinsic parameters of the extended camera are used for the images of the extended camera, so that the intrinsic parameters of the camera are involved in image matching, so that the model can be applied to the extended camera and improve the generalization ability of the model.
  • the same target detection model can be applied to cameras with different internal parameters.
  • the N times of training include at least one second training, and the second training includes the following steps:
  • parameters of the target detection model are adjusted.
  • the target detection model is trained based on real cameras in the camera set, and the target detection model can be well applied to any camera in the camera set.
  • the expanded internal parameters obtained by the transformation during each training of the S first trainings are Different, S is an integer less than or equal to N.
  • the target detection model can use more samples for training, thereby further improving the generalization ability of the target detection model.
  • the transformation of the internal parameters of the first camera to obtain the extended internal parameters used in the first training includes:
  • Random perturbation is used to make the internal parameters of the first camera float up and down within a range to increase the amount of data and improve the robustness of the target detection model.
  • the random perturbation of the internal parameters of the first camera includes:
  • the sub-parameters in the intrinsic parameters of the first camera are replaced with extended sub-parameters of the sub-parameters.
  • performing a geometric transformation on the image collected by the first camera according to the internal parameters of the first camera and the extended internal parameters to obtain the sample image used for the first training includes:
  • the dedistorted image is processed based on the extended internal parameters to obtain the sample image.
  • dedistortion can ensure that the image collected by the first camera is accurately mapped to the image space of the extended camera, thereby improving the accuracy of the model inference results.
  • this application provides a target detection method, which is applied to the process of detecting a target object using a target detection model trained by the method described in any one of the first aspects.
  • the method includes:
  • Coordinate transformation is performed on the two-dimensional position information and the depth information according to the internal parameters of the monocular camera to obtain the three-dimensional position information of the target object in the coordinate system of the monocular camera.
  • this application also provides a training device for a target detection model, including:
  • An information acquisition module configured to acquire internal parameters of at least one monocular camera and images collected by the at least one monocular camera
  • a training module configured to train the target detection model N times based on the internal parameters of the first camera and the images collected by the first camera, where the first camera is any one of the at least one monocular camera, and the N is an integer greater than 1;
  • the N times of training include at least one first training, and the first training includes the following steps:
  • the three-dimensional position information is used as the annotated position information of the target object in the sample image
  • parameters of the target detection model are adjusted.
  • the training module is also used to perform at least one second training in the N trainings, and the second training includes the following steps:
  • parameters of the target detection model are adjusted.
  • the expanded internal parameters obtained by the transformation during each training of the S first trainings are Different, S is an integer less than or equal to N.
  • the training module is used to randomly perturb the internal parameters of the first camera to obtain extended internal parameters used in this training.
  • the random perturbation of the internal parameters of the first camera is performed, and the training module is specifically used to:
  • the sub-parameters in the intrinsic parameters of the first camera are replaced with extended sub-parameters of the sub-parameters.
  • the training module is used to:
  • the dedistorted image is processed based on the extended internal parameters to obtain the sample image.
  • this application also provides a target detection device, which is applied to the process of detecting a target object using the target detection model obtained by the device according to any one of the third aspects.
  • the device includes:
  • the image acquisition module to be detected is used to acquire the image to be detected collected by the monocular camera and the internal parameters of the monocular camera;
  • a two-dimensional information acquisition module used to detect the image to be detected and obtain the two-dimensional position information and depth information of the target object in the coordinate system of the image to be detected;
  • a three-dimensional information determination module is configured to perform coordinate transformation on the two-dimensional position information and the depth information according to the internal parameters of the monocular camera to obtain the three-dimensional position information of the target object in the coordinate system of the monocular camera.
  • this application provides a chip system, including: a memory for storing a computer program; a processor; when the processor calls and runs the computer program from the memory, the electronic device installed with the chip system executes the following steps: The method described in any one of the first aspect and the second aspect.
  • the present application provides a computer program product containing instructions that, when run on a computer, cause the computer to perform the method described in any one of the first and second aspects.
  • the present application provides a computer-readable storage medium including instructions that, when run on a computer, cause the computer to perform the method described in any one of the first aspect and the second aspect.
  • this application also provides an electronic device, including:
  • Memory used to store readable programs
  • At least one processor configured to call and run the readable program from the memory, so that the communication device implements the method described in any one of the first aspect and the second aspect.
  • Figure 1 is a schematic diagram of visualizing 3D boxes on images and aerial views
  • Figure 2 is a schematic diagram of an application scenario provided by the embodiment of the present application.
  • Figure 3 is a schematic diagram of another application scenario provided by the embodiment of the present application.
  • Figure 4 is a schematic diagram of a monocular camera installed in an autonomous vehicle
  • Figure 5 is a schematic flowchart of the training method of the target detection model provided by the embodiment of the present application.
  • Figure 6 is another schematic flowchart of the training method of the target detection model provided by the embodiment of the present application.
  • Figure 7 is a schematic flow chart of the target detection method provided by the embodiment of the present application.
  • Figure 8 is a schematic diagram of the same target detection model used by multiple monocular cameras according to an embodiment of the present application.
  • Figure 9 is a schematic structural diagram of a training device for a target detection model provided by an embodiment of the present application.
  • Figure 10 is a schematic structural diagram of a target detection device provided by an embodiment of the present application.
  • Figure 11 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • the target detection model when performing 3D target detection based on images collected by a monocular camera, the target detection model cannot use image-based geometric transformation, a commonly used data amplification method, for data amplification, and further cannot use data amplification.
  • the model is trained in this way, resulting in poor generalization ability of the model. This is because the geometric transformation based on the image will affect the mapping relationship from 2D to 3D. This is explained below.
  • Using the target detection model to obtain the 3D box includes the following steps:
  • the image collected by the monocular camera is input into the target detection model.
  • the target detection model can be a depth prediction model based on two-dimensional images. This model is used to infer the images collected by the monocular camera and detect the target object in the target detection model. 2D coordinates (u, v) in the image and predict the depth Z corresponding to the 2D coordinates. Usually, the model infers the 2D coordinates and depth of the center point of the 3D box in the image. It may also infer the 2D coordinates and depth of a few 3D box vertices in the image. The model can be used to infer the 2D coordinates and depth of the 3D box vertices.
  • the imaging principle of the monocular camera is used to analyze the detection results of the target detection model to obtain the 3D coordinates of the target object in the camera coordinate system.
  • Equation 1 it is used to describe the mapping relationship from 2D to 3D:
  • (u, v) represents the 2D coordinates of the key point
  • Z represents the depth of the key point
  • K represents the intrinsic parameter of the monocular camera
  • K can be obtained through camera calibration.
  • fx, fy, cx and cy are all sub-parameters of the internal parameters
  • K -1 is the inverse matrix of K
  • (X, Y, Z) are the 3D coordinates of the pixel in the camera coordinate system of the monocular camera.
  • the 3D coordinates of the target object can be obtained, and then the length, width, height and orientation angle of the target object can be inferred.
  • the target detection model of a monocular camera is bound to specific internal parameters, and one model cannot be applied to multiple cameras with different internal parameters. That is, each monocular camera requires a target detection model, and a target detection model is only applicable to a monocular camera to which it is bound. Therefore, simple image-based data amplification of sample images is not suitable for the target detection model of monocular cameras, resulting in the generalization ability of the target detection model not being improved.
  • embodiments of the present application provide a feasible data amplification method to train the target detection model and improve the generalization ability of the model.
  • the internal camera parameter K of the monocular camera geometric transformation can be performed on the images collected by the monocular camera to achieve data amplification.
  • the internal parameter K′ (hereinafter also referred to as the extended internal parameter) can be constructed, and the original image is projected into an image captured by an extended camera with the internal parameter K′ .
  • K′ is different from K, only cx is different, then the above transformation is equivalent to performing a left and right translation operation on the original image; if only cy is different, it is equivalent to an up and down translation operation on the original image; if only fx is different, it is equivalent to performing a translation operation on the original image Perform a horizontal scaling operation; if only fy is different, it is equivalent to performing a vertical scaling operation on the original image.
  • the correct 3D coordinates can be obtained by left-multiplying the inverse matrix K′ -1 of K′.
  • the internal parameters of the monocular camera are transformed into the internal parameters of another monocular camera (that is, the extended internal parameters of the extended camera), and the images collected by the monocular camera are mapped
  • an object detection model is then trained.
  • the intrinsic parameters of the extended camera are used for the images of the extended camera, so that the intrinsic parameters of the camera are involved in image matching, so that the model can be applied to the extended camera and improve the generalization ability of the model.
  • the same target detection model can be applied to cameras with different internal parameters.
  • the target detection model can be trained on the server side or the terminal side. After training, the target detection model can be applied in the terminal.
  • the terminal can be, for example, a car, a mobile phone, a robot, or other equipment equipped with a monocular camera.
  • Figure 2 is a schematic diagram of an application scenario provided by an embodiment of the present application.
  • the application scenario includes a car 101 and a target object 102.
  • the car 101 collects images containing the target object through the monocular camera installed on the car 101, and the image can be input to the target detection model 103 in the car 101 to detect the 3D position of the target object 102.
  • the target detection model 103 can also be applied in the server.
  • the application scenario includes a car 101, a target object 102 and a server 104.
  • the monocular camera of the car 101 collects an image containing the target object, and then the car 101 sends the image to the server 104.
  • the server 104 uses its built-in target detection model 103 to infer the image to obtain the 3D position of the target object.
  • each monocular camera needs to train a target detection model separately, and a total of 16 target detection models need to be trained.
  • each target detection model can be trained for each type of camera, and ultimately a total of four target detection models need to be trained.
  • Each target detection model is suitable for the same type of monocular camera, so the generalization ability of the target detection model is improved.
  • FIG. 5 it is a schematic flow chart of the training method of the target detection model in the embodiment of the present application, including:
  • Step 501 Obtain internal parameters of at least one monocular camera and images collected by at least one monocular camera.
  • monocular cameras of the same camera type may constitute a camera set.
  • the cameras of a camera ensemble are used to jointly train the same object detection model.
  • it is not limited to the same camera type.
  • different types of monocular cameras can also build a camera set for training the same target detection model.
  • any monocular camera in the camera set can be used as the first camera to train the target detection model.
  • Step 502 Train the target detection model N times based on the internal parameters of the first camera and the images collected by the first camera, where N is an integer greater than 1.
  • the N times of training include at least one first training, and the first training includes the following steps:
  • Step 5021 Transform the internal parameters of the first camera to obtain extended internal parameters used in the first training.
  • the expanded internal parameters obtained by each transformation in the S times of first training are different, and S is less than or equal to N integer.
  • the internal parameter K of the first camera is transformed to obtain K' 1 , K' 2 , K' 3 ...K' m , a total of m different extended internal parameters.
  • any first camera assuming that it collects a total of p images and m extended internal parameters, it is equivalent to adding p*m sample images, thus increasing the sample images for training the target detection model, This further improves the generalization ability of the target detection model.
  • the internal parameters of the first camera can be transformed in a variety of ways, for example, any sub-parameter in the internal parameters is translated with equal or unequal steps, such as the internal parameters of the first camera.
  • it can also be adjusted according to the direction of expected change.
  • K′ is different from K and only cx is different, it is equivalent to performing a left and right translation operation on the original image; if only cy is different, it is equivalent to performing an up and down translation operation on the original image; If only fx is different, it is equivalent to performing a horizontal scaling operation on the original image; if only fy is different, it is equivalent to performing a vertical scaling operation on the original image.
  • the reasoning ability of the target detection model may be limited, causing the target detection model to fail to converge. Therefore, during implementation, try to ensure that the extended internal parameters are distributed around the internal parameters of the first camera. For example, the internal parameter distance from the first camera is within the threshold, so as to ensure that the accuracy of the inference results of the target detection model meets the expected requirements as much as possible, and to achieve it as soon as possible. Training convergence.
  • the implementation of the extended internal parameters used in this training can be obtained by randomly perturbing the internal parameters of the first camera within a threshold. Randomly perturb the internal parameters of the first camera so that the internal parameters of the first camera float up and down within a range to increase the amount of data and improve the robustness of the target detection model.
  • the embodiment of the present application adopts a data amplification method of variable internal parameters, so that the model can adapt to different internal parameters during training, so that it can adapt to multiple different internal parameters during use. Camera effects. Therefore, the generalization ability of the model can be improved, the same target detection model can be adapted to multiple cameras with different internal parameters, and one model can be used by multiple monocular cameras, which can reduce development costs.
  • the embodiment of the present application uses the original value (i.e., sub-parameter) of the internal parameter K of the first camera as the center point, based on the preset standard deviation, using Normal distribution is used to generate random values to replace the original values in K.
  • the extended internal parameter of the first camera Construct a normal distribution curve 1 for fx, and then obtain a point fx′ from the normal distribution curve 1 within a specified range centered on fx, then the extended internal parameter of
  • the extended internal parameters are not limited to being different in one sub-parameter compared to the internal parameters of the first camera. There can be multiple different subparameters. For example, not only the normal distribution curve 1 of fx can be constructed, but also the normal distribution curve 2 of cx can be constructed.
  • the extended sub-parameters of one or more of the sub-parameters can be selected to construct the extended internal parameters.
  • the parameter differences between different extended internal parameters can be differences in the same sub-parameters or differences in different sub-parameters.
  • the normal distribution curve 1 for fx multiple (maybe thousands) different fx′ can be obtained to obtain different extended internal parameters.
  • fx′ is different, but also other sub-parameters such as cx′ are different, different extended internal parameters will be obtained.
  • the cycle of model training can be as high as hundreds of thousands of times, each time using a different internal parameter K′, and this internal parameter K′ is either derived from the internal parameters of the real camera, or generated based on random perturbations of the internal parameters of the real camera.
  • this internal parameter K′ is either derived from the internal parameters of the real camera, or generated based on random perturbations of the internal parameters of the real camera.
  • the model uses a large amount of training samples for training, which can improve the generalization ability of the model.
  • Step 5022 Perform geometric transformation on the image collected by the first camera according to the internal parameters and extended internal parameters of the first camera to obtain the sample image used for the first training, and use the three-dimensional position information of the target object in the image collected by the first camera as the target object. Annotation location information in sample images.
  • Image geometric transformation also known as image space transformation, maps the coordinate position in one image to a new coordinate position in another image without changing the pixel value of the image.
  • Geometric transformations of images can include translation, rotation, scaling, orthoparallel projection, etc.
  • the geometric transformation of the image can be achieved through spatial transformation and interpolation algorithms.
  • the key to geometric transformation is the transformation parameter in the mapping process, which can be one or more of translation components, scaling factors, rotation angles, etc.
  • camera internal parameters do not need to be considered when performing geometric transformation on images collected by a camera.
  • the extended internal parameters of the extended camera are used as transformation parameters to realize translation and zooming of images collected by the first camera. Wait for operations.
  • cx is equivalent to performing a left and right translation operation on the original image.
  • Image distortion is caused by deviations in lens manufacturing precision and assembly processes, which lead to distortion of the original image.
  • Lens distortion is divided into two categories: radial distortion and tangential distortion.
  • Radial distortion is caused by the inherent properties of the convex lens itself, which occurs because light rays bend more away from the center of the lens than closer to the center. Distortion is distributed along the radius of the lens, mainly including barrel distortion and pincushion distortion.
  • Tangential distortion is caused by the fact that the lens itself is not parallel to the camera sensor plane (imaging plane). This situation is mostly caused by the installation deviation of the lens being pasted to the lens module.
  • the image captured by the first camera is dedistorted based on the internal parameters of the first camera and the distortion coefficient of the first camera, and we obtain The dedistorted image is then processed based on the extended internal parameters to obtain a sample image.
  • Step 5023 Use the target detection model to detect the sample image, and obtain the first two-dimensional position information and the first depth information of the target object in the sample image coordinate system.
  • the first two-dimensional position information is, for example, the position coordinates of each key point of the target object in the sample image, such as the 2D coordinates (u, v) mentioned above.
  • the first depth information is the depth for which the 2D coordinates (u, v) correspond, such as Z in formula (1).
  • Step 5024 Perform coordinate transformation on the first two-dimensional position information and the first depth information according to the extended internal parameters to obtain the first three-dimensional position information of the target object in the camera coordinate system corresponding to the extended internal parameters.
  • the first three-dimensional position information is the 3D coordinates of the key points of the target object calculated through formula (1).
  • Step 5025 Adjust parameters of the target detection model based on the difference between the first three-dimensional position information and the annotated position information.
  • each function has its own parameters, thus defining the functionality of the model.
  • the purpose of training is to estimate and adjust the parameters of these functions based on the training data set, so that the model can learn the mapping from input images to expected results.
  • the overall training process for extended internal parameters can be summarized as shown in Figure 6: After collecting the image, it is necessary to annotate the 3D coordinates of the key points of the obstacles in the image to obtain the annotated position of the target object.
  • the annotated position It is saved in the annotation file, so the annotation file contains the real 3D coordinates (x, y, z) of the key points of the obstacle (that is, the annotation position of the target object).
  • the annotation file contains the real 3D coordinates (x, y, z) of the key points of the obstacle (that is, the annotation position of the target object).
  • step 601 the image A, internal parameter K and distortion coefficient D collected by the monocular camera are obtained.
  • step 602 the internal parameter K is transformed into the internal parameter K'.
  • step 603 image A is dedistorted using intrinsic parameter K and distortion coefficient D to obtain image A', and then in step 604, image A' is geometrically transformed using extended intrinsic parameter K' to obtain image B.
  • step 602 and step 603 are not limited.
  • step 605 image B is input to the target detection model, and the 2D coordinates of the key points of the obstacle and their depths ( up , v p , Z p ) are output.
  • step 606 the predicted 3D coordinates are obtained by left-multiplying ( up , v p , Z p ) by the inverse matrix K′ -1 of the extended internal parameter K′.
  • step 607 the difference between the predicted 3D coordinates and the real 3D coordinates (x, y, z) is determined, and parameters of the target detection model can be adjusted based on the difference.
  • the training samples used in the same training batch belong to the same extended camera.
  • Each training batch includes multiple training samples, and the total difference between the predicted 3D coordinates and the real 3D coordinates of all training samples in the same training batch is calculated to adjust the parameters of the target detection model.
  • the aforementioned N times of training also include at least one second training.
  • the second training will use the internal parameters of the first camera to train the target detection model, which can be implemented as:
  • the parameters of the target detection model are adjusted.
  • embodiments of the present application also provide a method for target detection using the above target detection model, as shown in Figure 7, including the following steps:
  • Step 701 Obtain the image to be detected collected by the monocular camera and the internal parameters of the monocular camera.
  • Step 702 Detect the image to be detected and obtain the two-dimensional position information and depth information of the target object in the coordinate system of the image to be detected.
  • Step 703 Perform coordinate transformation on the two-dimensional position information and depth information according to the internal parameters of the monocular camera to obtain the three-dimensional position information of the target object in the monocular camera coordinate system.
  • the camera set includes internal parameters C of multiple monocular cameras.
  • Multiple extended internal parameters E can be expanded by using the internal parameters C of multiple monocular cameras.
  • the internal parameter set F used in the training phase includes internal parameter C and internal parameter E.
  • the complete 3D box information of the obstacle is inferred based on geometric relationships.
  • the same camera type includes four monocular cameras, assuming that they are camera 1, camera 2, camera 3 and camera 4 in order. Then these four cameras share the same target detection model.
  • the internal parameter K1 and distortion coefficient D1 of camera 1 are used to dedistort the collected images, and then the target detection model is input.
  • the target detection model obtains the internal parameter K1 of camera 1 and uses the internal parameter K1.
  • the inverse matrix of gets the 3D coordinates of the obstacle.
  • the internal parameter K2 and distortion coefficient D2 of camera 2 are used to dedistort the image collected, and then input into the target detection model.
  • the target detection model obtains the internal parameter K2 of camera 2, and uses the internal parameter K2
  • the inverse matrix of gets the 3D coordinates of the obstacle.
  • the processing methods of Camera 3 and Camera 4 are similar, so I won’t go into details here.
  • this application releases a target detection model that is suitable for one or even multiple cameras. Due to the manufacturing process, even cameras of the same brand and model will have differences in their internal parameters. If you train a corresponding model for each camera, it will obviously be too expensive. After adopting the solution of this application, a model can be trained to make it suitable for a certain model or even several models of cameras.
  • this application also provides a training device 900 for a target detection model.
  • the training device 900 includes:
  • Information acquisition module 901 used to acquire internal parameters of at least one monocular camera and images collected by the at least one monocular camera;
  • the training module 902 is configured to train the target detection model N times according to the internal parameters of the first camera and the images collected by the first camera, where the first camera is any one of the at least one monocular camera, so N is an integer greater than 1;
  • the N times of training include at least one first training, and the first training includes the following steps:
  • the three-dimensional position information is used as the annotated position information of the target object in the sample image
  • parameters of the target detection model are adjusted.
  • the training module is also used to perform at least one second training in the N trainings, and the second training includes the following steps:
  • parameters of the target detection model are adjusted.
  • the expanded internal parameters obtained by the transformation during each training of the S first trainings are Different, S is an integer less than or equal to N.
  • the training module is used to perform random perturbations on the internal parameters of the first camera to obtain expanded internal parameters used in the first training.
  • the random perturbation of the internal parameters of the first camera is performed, and the training module is specifically used to:
  • the sub-parameters in the intrinsic parameters of the first camera are replaced with extended sub-parameters of the sub-parameters.
  • the training module is used to:
  • the dedistorted image is processed based on the extended internal parameters to obtain the sample image.
  • this application also provides a target detection device 1000, which is used in the process of detecting target objects using the target detection model obtained by the training device 900.
  • the target detection device 1000 includes:
  • the image to be detected acquisition module 1001 is used to acquire the image to be detected collected by a monocular camera and the internal parameters of the monocular camera;
  • the two-dimensional information acquisition module 1002 is used to detect the image to be detected and obtain the two-dimensional position information and depth information of the target object in the coordinate system of the image to be detected;
  • the three-dimensional information determination module 1003 is used to perform coordinate transformation on the two-dimensional position information and the depth information according to the internal parameters of the monocular camera to obtain the three-dimensional position information of the target object in the monocular camera coordinate system. .
  • this application provides a chip system, including: a memory for storing a computer program; a processor; when the processor calls and runs the computer program from the memory, it causes the electronic device installed with the chip system to execute The training method or the target detection method of any of the target detection models described in this application.
  • this application provides a computer program product containing instructions that, when run on a computer, causes the computer to execute any of the target detection model training methods or target detection methods described in this application.
  • this application provides a computer-readable storage medium, including instructions.
  • the instructions When the instructions are run on a computer, the computer executes any of the training methods or target detection methods of the target detection model described in this application. .
  • inventions of the present application also provide an electronic device.
  • the electronic device may have a structure as shown in Figure 11.
  • the electronic device may be a computer device or a chip that can support the computer device to implement the above method. or system-on-a-chip.
  • the electronic device 1100 shown in Figure 11 may include at least one processor 1101, which is configured to be coupled with a memory, read and execute instructions in the memory to implement the target detection model of the embodiment of the present application. The steps of a training method or an object detection method.
  • the electronic device may also include a communication interface 1102 for supporting the electronic device to receive or send signaling or data.
  • the communication interface 1102 in the electronic device can be used to interact with other electronic devices.
  • the processor 1101 may be used to implement the electronic device to perform the steps in the method shown in any one of Figures 5-7.
  • the electronic device may also include a memory 1103 in which computer instructions are stored.
  • the memory 1103 may be coupled with the processor 1101 and/or the communication interface 1102 to support the processor 1001 in calling the computer instructions in the memory 1103 to implement Steps in the method shown in any one of Figures 5-7;
  • the memory 1103 can also be used to store data involved in the method embodiments of the present application, for example, used to store data necessary to support the communication interface 1002 to implement interaction.
  • Embodiments of the present application also provide a computer-readable storage medium.
  • Computer instructions are stored on the computer-readable storage medium. When these computer instructions are called and executed by a computer, they can cause the computer to complete any one of the above method embodiments and method embodiments. methods involved in possible designs.
  • the computer-readable storage medium is not limited. For example, it may be RAM (random-access memory), ROM (read-only memory), etc.
  • the above embodiments it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof.
  • software it may be implemented in whole or in part in the form of computer instructions.
  • the computer instructions When the computer instructions are loaded and executed on the computer, the processes or functions described in the embodiments of the present application are generated in whole or in part.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another, e.g., the computer instructions may be transferred from a website, computer, server, or data center Transmission to another website, computer, server or data center through wired (such as coaxial cable, optical fiber) or wireless (such as infrared, wireless, microwave, etc.) means.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data electronic device such as a server or data center integrated with one or more available media.
  • the available media may be magnetic media (eg, floppy disk, hard disk, tape), optical media (eg, DVD), or semiconductor media (eg, Solid State Disk (SSD)), etc.
  • the steps of the method or algorithm described in the embodiments of this application can be directly embedded in hardware, a software unit executed by a processor, or a combination of the two.
  • the software unit may be stored in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, register, hard disk, removable disk, CD-ROM or any other form of storage medium in the art.
  • the storage medium can be connected to the processor, so that the processor can read information from the storage medium and can store and write information to the storage medium.
  • the storage medium can also be integrated into the processor.
  • the processor and the storage medium can be installed in the ASIC, and the ASIC can be installed in the terminal device.
  • the processor and the storage medium may also be provided in different components in the terminal device.
  • These computer instructions may also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on the computer or other programmable device to produce computer-implemented processes, thereby causing the instructions to execute on the computer or other programmable device
  • steps for implementing the functionality specified in a process or processes in a flow diagram and/or in a block or blocks in a block diagram are also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on the computer or other programmable device to produce computer-implemented processes, thereby causing the instructions to execute on the computer or other programmable device.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

An object detection model training method. For any monocular camera, intrinsic parameters of the monocular camera are transformed into extension intrinsic parameters of another monocular camera and an image acquired by the monocular camera is mapped into an image acquired by an extension camera; and then an object detection model is trained. In a training stage, when 3D coordinates are generated, intrinsic parameters of the extension camera are used for the image of the extension camera to match the intrinsic parameters and the image of the camera, so that the model can be suitable for the extension camera, and the generalization capability of the model is improved. Further disclosed are a training apparatus for an object detection apparatus, an object detection method and apparatus, a chip system, a program product, a storage medium and an electronic device.

Description

目标检测模型的训练方法、目标检测方法及装置Training method of target detection model, target detection method and device 技术领域Technical field
本申请涉及计算机视觉技术领域,尤其涉及一种目标检测模型的训练方法、目标检测方法及装置。The present application relates to the field of computer vision technology, and in particular to a training method for a target detection model, a target detection method and a device.
背景技术Background technique
目标检测是计算机视觉领域的传统任务,与图像识别不同,目标检测需要将目标物体的位置通过最小包围框(Bounding box)的方式给出。3D(three dimensional dimensional,三维)目标检测中需要给出目标物体的3D包围框。以自动驾驶领域为例,3D目标检测得到目标物体的3D坐标,然后基于3D坐标得到3D框,之后如图1所示将3D框可视化在图像和鸟瞰图上。Target detection is a traditional task in the field of computer vision. Different from image recognition, target detection requires the position of the target object to be given in the form of a minimum bounding box (Bounding box). In 3D (three dimensional, three-dimensional) target detection, the 3D bounding box of the target object needs to be given. Taking the field of autonomous driving as an example, 3D target detection obtains the 3D coordinates of the target object, and then obtains a 3D frame based on the 3D coordinates. The 3D frame is then visualized on the image and aerial view as shown in Figure 1.
相关技术中提出了一些基于单目相机的3D目标检测方法。该方法采用目标检测模型处理单目相机采集的图像,可得到目标物体的3D顶点,进而得到3D框。但是,该方法中无法通过基于图像的几何变换进行数据扩增,故此模型的泛化能力差。Some 3D target detection methods based on monocular cameras have been proposed in related technologies. This method uses a target detection model to process images collected by a monocular camera, and can obtain the 3D vertices of the target object, and then obtain the 3D frame. However, this method cannot perform data amplification through image-based geometric transformation, so the generalization ability of the model is poor.
发明内容Contents of the invention
本申请提供一种目标检测模型的训练方法、目标检测方法及装置,用于提升基于单目相机的3D目标检测模型的泛化能力。This application provides a training method for a target detection model, a target detection method and a device, which are used to improve the generalization ability of a 3D target detection model based on a monocular camera.
第一方面,本申请提供一种目标检测模型的训练方法,包括:In the first aspect, this application provides a training method for a target detection model, including:
获取至少一个单目相机的内参以及所述至少一个单目相机采集的图像;Obtaining internal parameters of at least one monocular camera and images collected by the at least one monocular camera;
根据第一相机的内参和所述第一相机采集的图像,对目标检测模型进行N次训练,所述第一相机为所述至少一个单目相机中的任意一个,所述N为大于1的整数;According to the internal parameters of the first camera and the images collected by the first camera, the target detection model is trained N times, the first camera is any one of the at least one monocular camera, and the N is greater than 1 integer;
其中,所述N次训练中包括至少一次第一训练,所述第一训练包括以下步骤:Wherein, the N times of training include at least one first training, and the first training includes the following steps:
对所述第一相机的内参进行变换,得到所述第一训练使用的扩展内参;Transform the internal parameters of the first camera to obtain extended internal parameters used in the first training;
根据所述第一相机的内参和所述扩展内参对所述第一相机采集的图像进行几何变换,得到所述第一训练使用的样本图像,将目标对象在所述第一相机采集的图像中的三维位置信息作为所述目标对象在所述样本图像中的标注位置信息;Perform geometric transformation on the image collected by the first camera according to the internal parameters of the first camera and the extended internal parameter to obtain the sample image used for the first training, and place the target object in the image collected by the first camera. The three-dimensional position information is used as the annotated position information of the target object in the sample image;
使用所述目标检测模型对所述样本图像进行检测,得到所述目标对象在所述样本图像坐标系下的第一二维位置信息和第一深度信息;Use the target detection model to detect the sample image to obtain the first two-dimensional position information and the first depth information of the target object in the sample image coordinate system;
根据所述扩展内参对所述第一二维位置信息和所述第一深度信息进行坐标变换,得到所述目标对象在所述扩展内参对应的相机坐标系下的第一三维位置信息;Perform coordinate transformation on the first two-dimensional position information and the first depth information according to the extended internal parameters to obtain the first three-dimensional position information of the target object in the camera coordinate system corresponding to the extended internal parameters;
根据所述第一三维位置信息与所述标注位置信息的差异,对所述目标检测模型进行参数调整。According to the difference between the first three-dimensional position information and the marked position information, parameters of the target detection model are adjusted.
该实施方式中,针对任一单目相机,将该单目相机的内参变换成另一单目相机的内参(即扩展内参),将该单目相机采集的图像映射为该扩展相机采集的图像,然后训练目标检测模型。训练阶段,在生成3D坐标时对扩展相机的图像采用扩展相机的内参,使得相机的内参与图像匹配,从而使得模型能够适用于扩展相机,提高模型的泛化能力。也即,同一目标检测模型能适用不同内参的相机。In this embodiment, for any monocular camera, the internal parameters of the monocular camera are transformed into the internal parameters of another monocular camera (i.e., the extended internal parameters), and the image collected by the monocular camera is mapped to the image collected by the extended camera. , and then train the target detection model. In the training phase, when generating 3D coordinates, the intrinsic parameters of the extended camera are used for the images of the extended camera, so that the intrinsic parameters of the camera are involved in image matching, so that the model can be applied to the extended camera and improve the generalization ability of the model. In other words, the same target detection model can be applied to cameras with different internal parameters.
在一些实施方式中,所述N次训练中包括至少一次第二训练,所述第二训练包括以下步骤:In some embodiments, the N times of training include at least one second training, and the second training includes the following steps:
使用所述目标检测模型对所述第一相机采集的图像进行检测,得到所述目标对象在所述第一相机采集的图像的图像坐标系下的第二二维位置信息和第二深度信息;Use the target detection model to detect the image collected by the first camera to obtain the second two-dimensional position information and second depth information of the target object in the image coordinate system of the image collected by the first camera;
根据所述第一相机的内参对所述第二二维位置信息和所述第二深度信息进行坐标变换,得到所述目标对象在所述第一相机坐标系下的第二三维位置信息;Perform coordinate transformation on the second two-dimensional position information and the second depth information according to the internal parameters of the first camera to obtain the second three-dimensional position information of the target object in the first camera coordinate system;
根据所述第二三维位置信息与所述标注位置信息的差异,对所述目标检测模型进行参数调整。According to the difference between the second three-dimensional position information and the marked position information, parameters of the target detection model are adjusted.
该实施方式中,保证目标检测模型是基于相机集合中的真实相机进行训练的,目标检测模型可很好的适用该相机集合中的任一相机。In this implementation, it is ensured that the target detection model is trained based on real cameras in the camera set, and the target detection model can be well applied to any camera in the camera set.
在一些实施方式中,若所述N次训练中的S次第一训练对所述第一相机的内参进行了变换,则所述S次第一训练中每次训练时变换得到的扩展内参均不同,S为小于或等于N的整数。In some embodiments, if the S first trainings among the N trainings transform the internal parameters of the first camera, then the expanded internal parameters obtained by the transformation during each training of the S first trainings are Different, S is an integer less than or equal to N.
该方式中,使得目标检测模型能够采用更多样本进行训练,从而使得目标检测模型的泛化能力得到进一步提高。In this method, the target detection model can use more samples for training, thereby further improving the generalization ability of the target detection model.
在一些实施方式中,所述对所述第一相机的内参进行变换,得到所述第一训练使用的扩展内参,包括:In some implementations, the transformation of the internal parameters of the first camera to obtain the extended internal parameters used in the first training includes:
对所述第一相机的内参进行随机扰动,得到所述第一训练使用的扩展内参。Randomly perturb the internal parameters of the first camera to obtain the expanded internal parameters used in the first training.
该方式中,能够尽可能保证扩展内参尽可能在第一相机的内参周围变化,以尽可能保证目标检测模型推理结果的准确性满足期望要求,尽快实现训练收敛。通过随机扰动使得第一相机的内参在一个范围内上下浮动,以增加数据量,来提高目标检测模型的鲁棒性。In this method, it is possible to ensure that the extended internal parameters change around the internal parameters of the first camera as much as possible to ensure that the accuracy of the inference results of the target detection model meets the expected requirements and to achieve training convergence as soon as possible. Random perturbation is used to make the internal parameters of the first camera float up and down within a range to increase the amount of data and improve the robustness of the target detection model.
在一些实施方式中,所述对所述第一相机的内参进行随机扰动,包括:In some embodiments, the random perturbation of the internal parameters of the first camera includes:
针对所述第一相机的内参中的子参数,以所述子参数为中心构建正态分布曲线;For the sub-parameters in the internal parameters of the first camera, construct a normal distribution curve centered on the sub-parameters;
在以所述子参数为中心的指定范围内、从所述正态分布曲线上获取一个点,将获取的点作为所述子参数的扩展子参数;Obtain a point from the normal distribution curve within a specified range centered on the sub-parameter, and use the obtained point as an extended sub-parameter of the sub-parameter;
用所述子参数的扩展子参数替换所述第一相机的内参中的所述子参数。The sub-parameters in the intrinsic parameters of the first camera are replaced with extended sub-parameters of the sub-parameters.
该方式中,能够扩充出大量扩展内参,能够增加训练样本提高模型的泛化能力。In this method, a large number of extended internal parameters can be expanded, and training samples can be increased to improve the generalization ability of the model.
在一些实施方式中,所述根据所述第一相机的内参和所述扩展内参对所述第一相机采集的图像进行几何变换,得到所述第一训练使用的样本图像,包括:In some embodiments, performing a geometric transformation on the image collected by the first camera according to the internal parameters of the first camera and the extended internal parameters to obtain the sample image used for the first training includes:
基于所述第一相机的内参和所述第一相机的畸变系数对所述第一相机采集的图像进行去畸变,得到去畸变后的图像;De-distort the image collected by the first camera based on the internal parameters of the first camera and the distortion coefficient of the first camera to obtain a de-distorted image;
基于所述扩展内参对所述去畸变后的图像进行处理,得到所述样本图像。The dedistorted image is processed based on the extended internal parameters to obtain the sample image.
本申请实施例中通过去畸变能保证将第一相机采集的图像准确映射到扩展相机的图像空间,提高模型推理结果的准确性。In the embodiment of the present application, dedistortion can ensure that the image collected by the first camera is accurately mapped to the image space of the extended camera, thereby improving the accuracy of the model inference results.
第二方面,本申请提供一种目标检测方法,应用于如第一方面任一项所述的方法训练得到的目标检测模型对目标对象进行检测的过程,所述方法包括:In a second aspect, this application provides a target detection method, which is applied to the process of detecting a target object using a target detection model trained by the method described in any one of the first aspects. The method includes:
获取单目相机采集的待检测图像以及所述单目相机的内参;Obtain the image to be detected collected by the monocular camera and the internal parameters of the monocular camera;
对所述待检测图像进行检测,得到目标对象在所述待检测图像坐标系中的二维位置信息和深度信息;Detect the image to be detected and obtain the two-dimensional position information and depth information of the target object in the coordinate system of the image to be detected;
根据所述单目相机的内参对所述二维位置信息和所述深度信息进行坐标变换,得到所 述目标对象在所述单目相机坐标系下的三维位置信息。Coordinate transformation is performed on the two-dimensional position information and the depth information according to the internal parameters of the monocular camera to obtain the three-dimensional position information of the target object in the coordinate system of the monocular camera.
该实施方式中,多个不同内参的相机,皆因其内参在模型训练时被使用过,模型适应了其内参,则在推理时就可以使用该内参的逆矩阵去左乘(u,v,Z’),并得到正确的3D坐标。In this implementation, multiple cameras with different internal parameters have been used during model training. If the model adapts to its internal parameters, then the inverse matrix of the internal parameters can be used to left-multiply (u, v, Z') and get the correct 3D coordinates.
第三方面,本申请还提供一种目标检测模型的训练装置,包括:In a third aspect, this application also provides a training device for a target detection model, including:
信息获取模块,用于获取至少一个单目相机的内参以及所述至少一个单目相机采集的图像;An information acquisition module, configured to acquire internal parameters of at least one monocular camera and images collected by the at least one monocular camera;
训练模块,用于根据第一相机的内参和所述第一相机采集的图像,对目标检测模型进行N次训练,所述第一相机为所述至少一个单目相机中的任意一个,所述N为大于1的整数;A training module configured to train the target detection model N times based on the internal parameters of the first camera and the images collected by the first camera, where the first camera is any one of the at least one monocular camera, and the N is an integer greater than 1;
其中,所述N次训练中包括至少一次第一训练,所述第一训练包括以下步骤:Wherein, the N times of training include at least one first training, and the first training includes the following steps:
对所述第一相机的内参进行变换,得到所述第一训练使用的扩展内参;Transform the internal parameters of the first camera to obtain extended internal parameters used in the first training;
根据所述第一相机的内参和所述扩展内参对所述第一相机采集的图像进行几何变换,得到所述第一训练使用的样本图像,将目标对象在所述第一相机采集的图像中的三维位置信息作为所述目标对象在所述样本图像中的标注位置信息;Perform geometric transformation on the image collected by the first camera according to the internal parameters of the first camera and the extended internal parameter to obtain the sample image used for the first training, and place the target object in the image collected by the first camera. The three-dimensional position information is used as the annotated position information of the target object in the sample image;
使用所述目标检测模型对所述样本图像进行检测,得到所述目标对象在所述样本图像坐标系下的第一二维位置信息和第一深度信息;Use the target detection model to detect the sample image to obtain the first two-dimensional position information and the first depth information of the target object in the sample image coordinate system;
根据所述扩展内参对所述第一二维位置信息和所述第一深度信息进行坐标变换,得到所述目标对象在所述扩展内参对应的相机坐标系下的第一三维位置信息;Perform coordinate transformation on the first two-dimensional position information and the first depth information according to the extended internal parameters to obtain the first three-dimensional position information of the target object in the camera coordinate system corresponding to the extended internal parameters;
根据所述第一三维位置信息与所述标注位置信息的差异,对所述目标检测模型进行参数调整。According to the difference between the first three-dimensional position information and the marked position information, parameters of the target detection model are adjusted.
在一些实施方式中,所述训练模块,还用于在所述N次训练中执行至少一次第二训练,所述第二训练包括以下步骤:In some embodiments, the training module is also used to perform at least one second training in the N trainings, and the second training includes the following steps:
使用所述目标检测模型对所述第一相机采集的图像进行检测,得到所述目标对象在所述第一相机采集的图像的图像坐标系下的第二二维位置信息和第二深度信息;Use the target detection model to detect the image collected by the first camera to obtain the second two-dimensional position information and second depth information of the target object in the image coordinate system of the image collected by the first camera;
根据所述第一相机的内参对所述第二二维位置信息和所述第二深度信息进行坐标变换,得到所述目标对象在所述第一相机坐标系下的第二三维位置信息;Perform coordinate transformation on the second two-dimensional position information and the second depth information according to the internal parameters of the first camera to obtain the second three-dimensional position information of the target object in the first camera coordinate system;
根据所述第二三维位置信息与所述标注位置信息的差异,对所述目标检测模型进行参数调整。According to the difference between the second three-dimensional position information and the marked position information, parameters of the target detection model are adjusted.
在一些实施方式中,若所述N次训练中的S次第一训练对所述第一相机的内参进行了变换,则所述S次第一训练中每次训练时变换得到的扩展内参均不同,S为小于或等于N的整数。In some embodiments, if the S first trainings among the N trainings transform the internal parameters of the first camera, then the expanded internal parameters obtained by the transformation during each training of the S first trainings are Different, S is an integer less than or equal to N.
在一些实施方式中,所述训练模块,用于对所述第一相机的内参进行随机扰动,得到本次训练使用的扩展内参。In some implementations, the training module is used to randomly perturb the internal parameters of the first camera to obtain extended internal parameters used in this training.
在一些实施方式中,执行所述对所述第一相机的内参进行随机扰动,所述训练模块具体用于:In some implementations, the random perturbation of the internal parameters of the first camera is performed, and the training module is specifically used to:
针对所述第一相机的内参中的子参数,以所述子参数为中心构建正态分布曲线;For the sub-parameters in the internal parameters of the first camera, construct a normal distribution curve centered on the sub-parameters;
在以所述子参数为中心的指定范围内、从所述正态分布曲线上获取一个点,将获取的点作为所述子参数的扩展子参数;Obtain a point from the normal distribution curve within a specified range centered on the sub-parameter, and use the obtained point as an extended sub-parameter of the sub-parameter;
用所述子参数的扩展子参数替换所述第一相机的内参中的所述子参数。The sub-parameters in the intrinsic parameters of the first camera are replaced with extended sub-parameters of the sub-parameters.
在一些实施方式中,所述训练模块用于:In some embodiments, the training module is used to:
基于所述第一相机的内参和所述第一相机的畸变系数对所述第一相机采集的图像进行去畸变,得到去畸变后的图像;De-distort the image collected by the first camera based on the internal parameters of the first camera and the distortion coefficient of the first camera to obtain a de-distorted image;
基于所述扩展内参对所述去畸变后的图像进行处理,得到所述样本图像。The dedistorted image is processed based on the extended internal parameters to obtain the sample image.
第四方面,本申请还提供一种目标检测装置,应用于如第三方面中任一项所述的装置得到的目标检测模型对目标对象进行检测的过程,所述装置包括:In a fourth aspect, this application also provides a target detection device, which is applied to the process of detecting a target object using the target detection model obtained by the device according to any one of the third aspects. The device includes:
待检测图像获取模块,用于获取单目相机采集的待检测图像以及所述单目相机的内参;The image acquisition module to be detected is used to acquire the image to be detected collected by the monocular camera and the internal parameters of the monocular camera;
二维信息获取模块,用于对所述待检测图像进行检测,得到目标对象在所述待检测图像坐标系中的二维位置信息和深度信息;A two-dimensional information acquisition module, used to detect the image to be detected and obtain the two-dimensional position information and depth information of the target object in the coordinate system of the image to be detected;
三维信息确定模块,用于根据所述单目相机的内参对所述二维位置信息和所述深度信息进行坐标变换,得到所述目标对象在所述单目相机坐标系下的三维位置信息。A three-dimensional information determination module is configured to perform coordinate transformation on the two-dimensional position information and the depth information according to the internal parameters of the monocular camera to obtain the three-dimensional position information of the target object in the coordinate system of the monocular camera.
第五方面,本申请提供一种芯片系统,包括:存储器,用于存储计算机程序;处理器;当处理器从存储器中调用并运行计算机程序后,使得安装有该芯片系统的电子设备执行如第一方面和第二方面中任一项所述的方法。In a fifth aspect, this application provides a chip system, including: a memory for storing a computer program; a processor; when the processor calls and runs the computer program from the memory, the electronic device installed with the chip system executes the following steps: The method described in any one of the first aspect and the second aspect.
第六方面,本申请提供一种包含指令的计算机程序产品,当其在计算机上运行时,使得所述计算机执行如第一方面和第二方面中任一项所述的方法。In a sixth aspect, the present application provides a computer program product containing instructions that, when run on a computer, cause the computer to perform the method described in any one of the first and second aspects.
第七方面,本申请提供一种计算机可读存储介质,包括指令,当所述指令在计算机上运行时,使得计算机执行如第一方面和第二方面中任一项所述的方法。In a seventh aspect, the present application provides a computer-readable storage medium including instructions that, when run on a computer, cause the computer to perform the method described in any one of the first aspect and the second aspect.
第八方面,本申请还提供一种电子设备,包括:In an eighth aspect, this application also provides an electronic device, including:
存储器,用于存储可读程序;Memory, used to store readable programs;
至少一个处理器,用于从所述存储器中调用并运行所述可读程序,使得所述通信装置实现如第一方面和第二方面中任一项所述的方法。At least one processor, configured to call and run the readable program from the memory, so that the communication device implements the method described in any one of the first aspect and the second aspect.
附图说明Description of the drawings
图1为将3D框可视化在图像和鸟瞰图上的示意图;Figure 1 is a schematic diagram of visualizing 3D boxes on images and aerial views;
图2为本申请实施例提供的一种应用场景示意图;Figure 2 is a schematic diagram of an application scenario provided by the embodiment of the present application;
图3为本申请实施例提供的另一种应用场景示意图;Figure 3 is a schematic diagram of another application scenario provided by the embodiment of the present application;
图4为自动驾驶车辆中装配单目相机的示意图;Figure 4 is a schematic diagram of a monocular camera installed in an autonomous vehicle;
图5为本申请实施例提供的目标检测模型的训练方法的流程示意图;Figure 5 is a schematic flowchart of the training method of the target detection model provided by the embodiment of the present application;
图6为本申请实施例提供的目标检测模型的训练方法的另一流程示意图;Figure 6 is another schematic flowchart of the training method of the target detection model provided by the embodiment of the present application;
图7为本申请实施例提供的目标检测方法的流程示意图;Figure 7 is a schematic flow chart of the target detection method provided by the embodiment of the present application;
图8为本申请实施例的同一目标检测模型供多个单目相机使用的示意图;Figure 8 is a schematic diagram of the same target detection model used by multiple monocular cameras according to an embodiment of the present application;
图9为本申请实施例提供的目标检测模型的训练装置的结构示意图;Figure 9 is a schematic structural diagram of a training device for a target detection model provided by an embodiment of the present application;
图10为本申请实施例提供的目标检测装置的结构示意图;Figure 10 is a schematic structural diagram of a target detection device provided by an embodiment of the present application;
图11为本申请实施例提供的电子设备的结构示意图。Figure 11 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
具体实施方式Detailed ways
下面将结合附图,对本申请实施例进行详细描述。The embodiments of the present application will be described in detail below with reference to the accompanying drawings.
如背景技术所言,当基于单目相机采集的图像进行3D目标检测时,目标检测模型无法使用基于图像的几何变换这种常用的数据扩增方式进行数据扩增,进而无法采用数据扩 增的方式来训练模型,导致模型的泛化能力差,这是因为基于图像的几何变换会影响2D到3D的映射关系。下面对此进行说明,采用目标检测模型得到3D框包括以下步骤:As mentioned in the background art, when performing 3D target detection based on images collected by a monocular camera, the target detection model cannot use image-based geometric transformation, a commonly used data amplification method, for data amplification, and further cannot use data amplification. The model is trained in this way, resulting in poor generalization ability of the model. This is because the geometric transformation based on the image will affect the mapping relationship from 2D to 3D. This is explained below. Using the target detection model to obtain the 3D box includes the following steps:
第一步,将单目相机采集的图像输入目标检测模型,该目标检测模型可以是基于二维图像的深度预测模型,该模型用于对单目相机采集的图像进行推理,检测出目标对象在图像中的2D坐标(u,v)并预测出该2D坐标对应的深度Z。通常该模型推理出3D框的中心点在图像中的2D坐标和深度,同时可能还会推理出少数几个3D框顶点在图像上的2D坐标和深度,可通过已推理出的3D框顶点的2D坐标和深度、以及障碍物的几何形状来计算剩余未推理的3D框顶点的2D坐标和深度。由于3D框具有8个顶点再加上中心点,由此可得到9个关键点的2D坐标和深度。In the first step, the image collected by the monocular camera is input into the target detection model. The target detection model can be a depth prediction model based on two-dimensional images. This model is used to infer the images collected by the monocular camera and detect the target object in the target detection model. 2D coordinates (u, v) in the image and predict the depth Z corresponding to the 2D coordinates. Usually, the model infers the 2D coordinates and depth of the center point of the 3D box in the image. It may also infer the 2D coordinates and depth of a few 3D box vertices in the image. The model can be used to infer the 2D coordinates and depth of the 3D box vertices. 2D coordinates and depth, and the geometry of the obstacle to calculate the 2D coordinates and depth of the remaining unreasoned 3D box vertices. Since the 3D box has 8 vertices plus the center point, the 2D coordinates and depth of the 9 key points can be obtained.
第二步,利用单目相机的成像原理来解析目标检测模型的检测结果,以获得目标对象在相机坐标系中的3D坐标。如公式1所示,用于描述2D到3D的映射关系:In the second step, the imaging principle of the monocular camera is used to analyze the detection results of the target detection model to obtain the 3D coordinates of the target object in the camera coordinate system. As shown in Equation 1, it is used to describe the mapping relationship from 2D to 3D:
Figure PCTCN2022088566-appb-000001
Figure PCTCN2022088566-appb-000001
在公式(1)中,(u,v)表示关键点的2D坐标,Z表示关键点的深度,K表示单目相机的内参,K可通过相机标定得到,
Figure PCTCN2022088566-appb-000002
其中fx、fy、cx和cy均为内参的子参数,K -1为K的逆矩阵,(X,Y,Z)为像素点在单目相机的相机坐标系中的3D坐标。
In formula (1), (u, v) represents the 2D coordinates of the key point, Z represents the depth of the key point, K represents the intrinsic parameter of the monocular camera, and K can be obtained through camera calibration.
Figure PCTCN2022088566-appb-000002
Among them, fx, fy, cx and cy are all sub-parameters of the internal parameters, K -1 is the inverse matrix of K, and (X, Y, Z) are the 3D coordinates of the pixel in the camera coordinate system of the monocular camera.
通过上述两个步骤,可得到目标对象的3D坐标,进而可以推理出目标对象的长、宽、高以及朝向角。Through the above two steps, the 3D coordinates of the target object can be obtained, and then the length, width, height and orientation angle of the target object can be inferred.
当对图像进行几何变换,例如平移、缩放、旋转等几何变换之后,会使得(u,v,Z)值与第二步中所用的K -1不适配,从而计算出错误的3D坐标。 When geometric transformations are performed on the image, such as translation, scaling, rotation, etc., the (u, v, Z) values will not match the K -1 used in the second step, thus calculating incorrect 3D coordinates.
因此,单目相机的目标检测模型与特定的内参绑定,无法将一个模型应用于内参不同的多个相机。也即,每个单目相机需要一个目标检测模型,一个目标检测模型仅适用于其绑定的一个单目相机。故此,对样本图像进行简单的基于图像的数据扩增并不适用于单目相机的目标检测模型,导致目标检测模型的泛化能力得不到提升。Therefore, the target detection model of a monocular camera is bound to specific internal parameters, and one model cannot be applied to multiple cameras with different internal parameters. That is, each monocular camera requires a target detection model, and a target detection model is only applicable to a monocular camera to which it is bound. Therefore, simple image-based data amplification of sample images is not suitable for the target detection model of monocular cameras, resulting in the generalization ability of the target detection model not being improved.
有鉴于此,本申请实施例为了提升目标检测模型的泛化能力,提供了一种可行的数据扩增方法来训练目标检测模型,提升模型的泛化能力。In view of this, in order to improve the generalization ability of the target detection model, embodiments of the present application provide a feasible data amplification method to train the target detection model and improve the generalization ability of the model.
本申请实施例中,借助单目相机的相机内参K,可以对该单目相机采集的图像实施几何变换实现数据扩增。为了能维持正确的2D到3D的映射关系,基于内参K,可以构造内参K′(后文亦称之为扩展内参),并将原图投影为一个内参为K′的扩展相机所拍摄的图像。若K′相对K,只有cx不同,则上述变换相当于对原图执行左右平移操作;若只有cy不同,则相当于对原图做上下平移操作;若只有fx不同,则相当于对原图做水平缩放操作;若只有fy不同,则相当于对原图做垂直缩放操作。并且,对于利用K′实施了几何变换的图像,其平面上的任意(u,v,Z)点,左乘以K′的逆矩阵K′ -1,能得到正确的3D坐标。 In the embodiment of the present application, with the help of the internal camera parameter K of the monocular camera, geometric transformation can be performed on the images collected by the monocular camera to achieve data amplification. In order to maintain the correct 2D to 3D mapping relationship, based on the internal parameter K, the internal parameter K′ (hereinafter also referred to as the extended internal parameter) can be constructed, and the original image is projected into an image captured by an extended camera with the internal parameter K′ . If K′ is different from K, only cx is different, then the above transformation is equivalent to performing a left and right translation operation on the original image; if only cy is different, it is equivalent to an up and down translation operation on the original image; if only fx is different, it is equivalent to performing a translation operation on the original image Perform a horizontal scaling operation; if only fy is different, it is equivalent to performing a vertical scaling operation on the original image. Moreover, for an image that has been geometrically transformed using K′, for any (u, v, Z) point on the plane, the correct 3D coordinates can be obtained by left-multiplying the inverse matrix K′ -1 of K′.
基于此,本申请实施例中,针对任一单目相机,将该单目相机的内参变换成另一单目相机的内参(即扩展相机的扩展内参),将该单目相机采集的图像映射为该扩展相机采集的图像,然后训练目标检测模型。训练阶段,在生成3D坐标时对扩展相机的图像采用扩展相机的内参,使得相机的内参与图像匹配,从而使得模型能够适用于扩展相机,提高模型的泛化能力。也即,同一目标检测模型能适用不同内参的相机。Based on this, in the embodiment of the present application, for any monocular camera, the internal parameters of the monocular camera are transformed into the internal parameters of another monocular camera (that is, the extended internal parameters of the extended camera), and the images collected by the monocular camera are mapped For the images collected by the extended camera, an object detection model is then trained. In the training phase, when generating 3D coordinates, the intrinsic parameters of the extended camera are used for the images of the extended camera, so that the intrinsic parameters of the camera are involved in image matching, so that the model can be applied to the extended camera and improve the generalization ability of the model. In other words, the same target detection model can be applied to cameras with different internal parameters.
本申请实施例中目标检测模型可以在服务器侧或终端侧完成训练,训练好之后目标检测模型可应用在终端中。终端例如可以是车,手机,机器人等安装有单目摄像头的设备。以车为例,如图2所示为本申请实施例提供的一种应用场景示意图。该应用场景中包括车101和目标对象102。车101通过其安装的单目相机采集包含目标对象的图像,可将该图像输入给车101中的目标检测模型103检测目标对象102的3D位置。In the embodiment of the present application, the target detection model can be trained on the server side or the terminal side. After training, the target detection model can be applied in the terminal. The terminal can be, for example, a car, a mobile phone, a robot, or other equipment equipped with a monocular camera. Taking a car as an example, Figure 2 is a schematic diagram of an application scenario provided by an embodiment of the present application. The application scenario includes a car 101 and a target object 102. The car 101 collects images containing the target object through the monocular camera installed on the car 101, and the image can be input to the target detection model 103 in the car 101 to detect the 3D position of the target object 102.
在另一种应用场景中,目标检测模型103也可应用在服务器中。如图3所示,该应用场景中包括车101、目标对象102和服务器104。由车101的单目相机采集包含目标对象的图像,然后由车101将该图像发送给服务器104,服务器104采用其内置的目标检测模型103对该图像进行推理得到目标对象的3D位置。In another application scenario, the target detection model 103 can also be applied in the server. As shown in Figure 3, the application scenario includes a car 101, a target object 102 and a server 104. The monocular camera of the car 101 collects an image containing the target object, and then the car 101 sends the image to the server 104. The server 104 uses its built-in target detection model 103 to infer the image to obtain the 3D position of the target object.
为了便于直观的了解本申请实施例中目标检测模型的泛化能力,下面结合图4对此进行说明。如图4所示,车辆中共安装有四种类型的单目相机,相机总数高达16个。第一种相机类型为长距相机共有1个、第二种相机类型为中距相机共有4个、第三种相机类型为短距相机共有7个、第四种相机类型为鱼眼相机共有4个。相关技术中需要每个单目相机单独训练一个目标检测模型,共需训练16个目标检测模型。而采用本申请实施例提供的方法,由于同种类型的相机之间,其相机内参的差距不大,故此可每种类型的相机训练一个目标检测模型,最终共需训练4个目标检测模型,每个目标检测模型适用于同一类单目相机,由此目标检测模型的泛化能力得到提升。In order to facilitate an intuitive understanding of the generalization ability of the target detection model in the embodiment of the present application, this is explained below in conjunction with Figure 4. As shown in Figure 4, there are four types of monocular cameras installed in the vehicle, with a total of 16 cameras. The first camera type is a long-range camera with a total of 1, the second camera type is a mid-range camera with a total of 4, the third camera type is a short-range camera with a total of 7, and the fourth camera type is a fisheye camera with a total of 4 indivual. In related technologies, each monocular camera needs to train a target detection model separately, and a total of 16 target detection models need to be trained. Using the method provided by the embodiments of this application, since the camera internal parameters of the same type of camera have little difference, one target detection model can be trained for each type of camera, and ultimately a total of four target detection models need to be trained. Each target detection model is suitable for the same type of monocular camera, so the generalization ability of the target detection model is improved.
如图5所示,为本申请实施例中目标检测模型的训练方法的流程示意图,包括:As shown in Figure 5, it is a schematic flow chart of the training method of the target detection model in the embodiment of the present application, including:
步骤501,获取至少一个单目相机的内参以及至少一个单目相机采集的图像。Step 501: Obtain internal parameters of at least one monocular camera and images collected by at least one monocular camera.
例如,如图4所示,可以是同一种相机类型的单目相机构成一个相机集合。相机集合的相机用于共同训练同一目标检测模型。在实施时,也可以不局限于同一种相机类型,只要内参差距小于差距阈值,不同类型的单目相机也可以构建一个相机集合,用于训练同一目标检测模型。训练时,相机集合中任意一个单目相机均可分别作为第一相机对目标检测模型进行训练。For example, as shown in Figure 4, monocular cameras of the same camera type may constitute a camera set. The cameras of a camera ensemble are used to jointly train the same object detection model. During implementation, it is not limited to the same camera type. As long as the internal parameter difference is less than the gap threshold, different types of monocular cameras can also build a camera set for training the same target detection model. During training, any monocular camera in the camera set can be used as the first camera to train the target detection model.
步骤502,根据第一相机的内参和第一相机采集的图像,对目标检测模型进行N次训练,N为大于1的整数。Step 502: Train the target detection model N times based on the internal parameters of the first camera and the images collected by the first camera, where N is an integer greater than 1.
其中,N次训练中包括至少一次第一训练,所述第一训练包括以下步骤:Among them, the N times of training include at least one first training, and the first training includes the following steps:
步骤5021,对第一相机的内参进行变换,得到第一训练使用的扩展内参。Step 5021: Transform the internal parameters of the first camera to obtain extended internal parameters used in the first training.
在一些实施方式中,若N次训练中有S次第一训练对第一相机的内参进行了变换,则S次第一训练中每次变换得到的扩展内参均不同,S为小于或等于N的整数。例如,第一相机的内参K经过变换得到K′ 1、K′ 2、K′ 3……K′ m共m个各不相同的扩展内参。则针对任一个第一相机,假设其采集的图像共有p幅图像,扩展内参有m个,则相当于将样本图像增加了p*m张,由此增加了目标检测模型进行训练的样本图像,从而使得目标检测模型的泛化能力得到进一步提高。 In some embodiments, if there are S times of first training among N times of training that transform the internal parameters of the first camera, then the expanded internal parameters obtained by each transformation in the S times of first training are different, and S is less than or equal to N integer. For example, the internal parameter K of the first camera is transformed to obtain K' 1 , K' 2 , K' 3 ...K' m , a total of m different extended internal parameters. For any first camera, assuming that it collects a total of p images and m extended internal parameters, it is equivalent to adding p*m sample images, thus increasing the sample images for training the target detection model, This further improves the generalization ability of the target detection model.
在一些实施方式中,第一相机的内参进行变换的方式可以有多种,例如将内参中的任一子参数进行等步长或不等步长的平移,例如第一相机的内参
Figure PCTCN2022088566-appb-000003
fx按步长d平移之后得到fx′,则构成的扩展内参
Figure PCTCN2022088566-appb-000004
平移的方式例如是fx′=fx+d,再例如,还可以是fx′=fx-d。此外,还可以按照期望变化的方向进行调 整,例如若K′相对K,只有cx不同,则相当于对原图执行左右平移操作;若只有cy不同,则相当于对原图做上下平移操作;若只有fx不同,则相当于对原图做水平缩放操作;若只有fy不同,则相当于对原图做垂直缩放操作。
In some implementations, the internal parameters of the first camera can be transformed in a variety of ways, for example, any sub-parameter in the internal parameters is translated with equal or unequal steps, such as the internal parameters of the first camera.
Figure PCTCN2022088566-appb-000003
After fx is translated by step size d, fx′ is obtained, then the extended internal parameter of
Figure PCTCN2022088566-appb-000004
The translation method is, for example, fx′=fx+d, or, for example, fx′=fx-d. In addition, it can also be adjusted according to the direction of expected change. For example, if K′ is different from K and only cx is different, it is equivalent to performing a left and right translation operation on the original image; if only cy is different, it is equivalent to performing an up and down translation operation on the original image; If only fx is different, it is equivalent to performing a horizontal scaling operation on the original image; if only fy is different, it is equivalent to performing a vertical scaling operation on the original image.
本申请实施例中为了避免扩展内参和第一相机的内参差距较大,导致目标检测模型的推理能力可能受到限制,导致目标检测模型无法收敛。故此,在实施时尽可能保证扩展内参分布在第一相机的内参周围,例如,与第一相机的内参距离在阈值内,以尽可能保证目标检测模型推理结果的准确性满足期望要求,尽快实现训练收敛。在一种可能的实现方式中,可以通过对第一相机的内参进行阈值内的随机扰动得到本次训练使用的扩展内参的实施方式。对第一相机的内参进行随机扰动使得第一相机的内参在一个范围内上下浮动,以增加数据量,来提高目标检测模型的鲁棒性。In the embodiment of the present application, in order to avoid a large gap between the extended internal parameters and the internal parameters of the first camera, the reasoning ability of the target detection model may be limited, causing the target detection model to fail to converge. Therefore, during implementation, try to ensure that the extended internal parameters are distributed around the internal parameters of the first camera. For example, the internal parameter distance from the first camera is within the threshold, so as to ensure that the accuracy of the inference results of the target detection model meets the expected requirements as much as possible, and to achieve it as soon as possible. Training convergence. In a possible implementation, the implementation of the extended internal parameters used in this training can be obtained by randomly perturbing the internal parameters of the first camera within a threshold. Randomly perturb the internal parameters of the first camera so that the internal parameters of the first camera float up and down within a range to increase the amount of data and improve the robustness of the target detection model.
相对于原有的一个目标检测模型采用固定内参的做法,本申请实施例采用了可变内参的数据扩增方法,使模型在训练中适应不同内参,达到在使用时能适应多个不同内参的相机的效果。故此可提升模型的泛化能力,使同一目标检测模型适配多个不同内参的相机,实现一个模型供多个单目相机使用,可降低开发成本。Compared with the original method of using fixed internal parameters for a target detection model, the embodiment of the present application adopts a data amplification method of variable internal parameters, so that the model can adapt to different internal parameters during training, so that it can adapt to multiple different internal parameters during use. Camera effects. Therefore, the generalization ability of the model can be improved, the same target detection model can be adapted to multiple cameras with different internal parameters, and one model can be used by multiple monocular cameras, which can reduce development costs.
在一些可能的实施方式中,为了便于通过随机扰动的方式得到扩展内参,本申请实施例以第一相机的内参K中原值(即子参数)为中心点,基于预先设定的标准差,利用正态分布来生成随机值替换K中原值,可实施为:针对第一相机的内参中的子参数,以该子参数为中心构建正态分布曲线;之后,在以该子参数为中心的指定范围内、从正态分布曲线上获取一个点,将获取的点作为子参数的扩展子参数;然后用扩展子参数替换第一相机的内参中的该子参数,得到扩展内参。In some possible implementations, in order to facilitate the expansion of internal parameters through random perturbation, the embodiment of the present application uses the original value (i.e., sub-parameter) of the internal parameter K of the first camera as the center point, based on the preset standard deviation, using Normal distribution is used to generate random values to replace the original values in K. It can be implemented as follows: for the sub-parameters in the internal parameters of the first camera, construct a normal distribution curve centered on the sub-parameters; then, in the specified sub-parameters centered on the sub-parameters Within the range, obtain a point from the normal distribution curve, and use the obtained point as the expanded sub-parameter of the sub-parameter; then replace the sub-parameter in the internal parameter of the first camera with the expanded sub-parameter to obtain the expanded internal parameter.
举例说明,如第一相机的内参
Figure PCTCN2022088566-appb-000005
针对fx构建正态分布曲线1,然后从正态分布曲线1中以fx为中心的指定范围内获取一点fx′,则构成的扩展内参
Figure PCTCN2022088566-appb-000006
当然,实施时,扩展内参相对于第一相机的内参而言,并不局限于一个子参数不同。可以有多个子参数不同。例如,不仅构建fx的正态分布曲线1,还可以构建cx的正态分布曲线2。并从正态分布曲线2中靠近cx获取一点cx′以替换cx,由此得到的扩展内参
Figure PCTCN2022088566-appb-000007
由于第一相机的内参共有4个子参数,可选取其中一个或任意多个子参数的扩展子参数来构建扩展内参。
For example, such as the internal parameters of the first camera
Figure PCTCN2022088566-appb-000005
Construct a normal distribution curve 1 for fx, and then obtain a point fx′ from the normal distribution curve 1 within a specified range centered on fx, then the extended internal parameter of
Figure PCTCN2022088566-appb-000006
Of course, during implementation, the extended internal parameters are not limited to being different in one sub-parameter compared to the internal parameters of the first camera. There can be multiple different subparameters. For example, not only the normal distribution curve 1 of fx can be constructed, but also the normal distribution curve 2 of cx can be constructed. And get a point cx′ close to cx from the normal distribution curve 2 to replace cx, and the resulting extended internal parameter
Figure PCTCN2022088566-appb-000007
Since the internal parameters of the first camera have four sub-parameters, the extended sub-parameters of one or more of the sub-parameters can be selected to construct the extended internal parameters.
此外,需要说明的是,不同的扩展内参之间的参数差异,可以是同一子参数的差异,也可以是不同子参数的差异。例如针对fx的正态分布曲线1中,可获取多个(可以是成千上万个)不同的fx′,得到不同扩展内参。再例如,不仅fx′不同,且还有其他子参数如cx′不同,则得到不同的扩展内参。In addition, it should be noted that the parameter differences between different extended internal parameters can be differences in the same sub-parameters or differences in different sub-parameters. For example, in the normal distribution curve 1 for fx, multiple (maybe thousands) different fx′ can be obtained to obtain different extended internal parameters. For another example, if not only fx′ is different, but also other sub-parameters such as cx′ are different, different extended internal parameters will be obtained.
模型训练的循环可以高达几十万次,每次都使用不同的内参K′,并且这个内参K′要么来自真实相机的内参,要么根据真实相机的内参加随机扰动生成。由此,模型采用了大数据量的训练样本进行训练,可提高模型的泛化能力。The cycle of model training can be as high as hundreds of thousands of times, each time using a different internal parameter K′, and this internal parameter K′ is either derived from the internal parameters of the real camera, or generated based on random perturbations of the internal parameters of the real camera. As a result, the model uses a large amount of training samples for training, which can improve the generalization ability of the model.
在本申请中,使用不同的内参生成方式,可以产生不同的效果。例如,一台无人驾驶车辆上有多个相机。可以对这些相机进行标定,得到它们的内参列表,然后在模型训练中循环使用该列表中的内参来训练模型,达到一个模型适配该车辆上多个相机的效果。又如,一款相机出厂时有默认的内参值。但由于制造工艺的原因,其实际内参会与默认内参有差 别,但大致分布在其默认内参的附近。可以以默认内参为中心,以正态分布来生成扰动过的内参,并应用于模型训练中,使得模型适应其默认内参附近的所有情况,达到一个模型适配一款相机的效果。In this application, using different internal reference generation methods can produce different effects. For example, there are multiple cameras on a self-driving vehicle. These cameras can be calibrated to obtain their internal parameter list, and then the internal parameters in the list can be used cyclically during model training to train the model, so that one model can adapt to multiple cameras on the vehicle. For another example, a camera has default internal parameter values when it leaves the factory. However, due to the manufacturing process, the actual internal parameters will be different from the default internal parameters, but they are roughly distributed near the default internal parameters. Perturbed internal parameters can be generated with the default internal parameters as the center and normal distribution, and used in model training to make the model adapt to all situations near its default internal parameters, achieving the effect of one model adapting to one camera.
步骤5022,根据第一相机的内参和扩展内参对第一相机采集的图像进行几何变换,得到第一训练使用的样本图像,将目标对象在第一相机采集的图像中的三维位置信息作为目标对象在样本图像中的标注位置信息。Step 5022: Perform geometric transformation on the image collected by the first camera according to the internal parameters and extended internal parameters of the first camera to obtain the sample image used for the first training, and use the three-dimensional position information of the target object in the image collected by the first camera as the target object. Annotation location information in sample images.
图像几何变换又称为图像空间变换,它将一幅图像中的坐标位置映射到另一幅图像中的新坐标位置而不改变图像的像素值。通常在图像处理中进行几何变换可以尽可能消除由于成像角度、透视关系等造成的几何失真。图像的几何变换可包括平移、旋转、缩放和正平行投影等。图像的几何变换可以通过空间变换和插值算法来实现。其中,几何变换的关键就是映射过程中的变换参数,例如可以是平移的分量,缩放因子,旋转角度等中的一种或多种。一般而言在对相机采集的图像进行几何变换时不需要考虑相机内参,在本申请实施例中采用扩展相机的扩展内参作为变换参数,用于实现对第一相机采集的图像的实现平移、缩放等操作。例如,当扩展内参若K′相对第一相机的内参K而言,只有cx不同,则相当于对原图执行左右平移操作。Image geometric transformation, also known as image space transformation, maps the coordinate position in one image to a new coordinate position in another image without changing the pixel value of the image. Usually geometric transformation in image processing can eliminate geometric distortion caused by imaging angle, perspective relationship, etc. as much as possible. Geometric transformations of images can include translation, rotation, scaling, orthoparallel projection, etc. The geometric transformation of the image can be achieved through spatial transformation and interpolation algorithms. Among them, the key to geometric transformation is the transformation parameter in the mapping process, which can be one or more of translation components, scaling factors, rotation angles, etc. Generally speaking, camera internal parameters do not need to be considered when performing geometric transformation on images collected by a camera. In the embodiment of the present application, the extended internal parameters of the extended camera are used as transformation parameters to realize translation and zooming of images collected by the first camera. Wait for operations. For example, when the extended intrinsic parameter K′ is only different from the intrinsic parameter K of the first camera, cx is equivalent to performing a left and right translation operation on the original image.
此外,单目相机采集的图像存在图像畸变。图像畸变是由于透镜制造精度以及组装工艺的偏差会引入畸变,导致原始图像失真。镜头的畸变分为径向畸变和切向畸变两类。径向畸变是由于镜头自身凸透镜的固有特性造成的,产生原因是光线在远离透镜中心的地方比靠近中心的地方更加弯曲。畸变沿着透镜半径方向分布,主要包括桶形畸变和枕形畸变两种。切向畸变是由于透镜本身与相机传感器平面(成像平面)不平行而产生的,这种情况多是由于透镜被粘贴到镜头模组上的安装偏差导致。故此,为了保证将第一相机采集的图像准确映射到扩展相机的图像空间,本申请实施例中基于第一相机的内参和第一相机的畸变系数对第一相机采集的图像进行去畸变,得到去畸变后的图像;然后,基于扩展内参对去畸变后的图像进行处理,得到样本图像。In addition, the images collected by the monocular camera have image distortion. Image distortion is caused by deviations in lens manufacturing precision and assembly processes, which lead to distortion of the original image. Lens distortion is divided into two categories: radial distortion and tangential distortion. Radial distortion is caused by the inherent properties of the convex lens itself, which occurs because light rays bend more away from the center of the lens than closer to the center. Distortion is distributed along the radius of the lens, mainly including barrel distortion and pincushion distortion. Tangential distortion is caused by the fact that the lens itself is not parallel to the camera sensor plane (imaging plane). This situation is mostly caused by the installation deviation of the lens being pasted to the lens module. Therefore, in order to ensure that the image captured by the first camera is accurately mapped to the image space of the extended camera, in the embodiment of the present application, the image captured by the first camera is dedistorted based on the internal parameters of the first camera and the distortion coefficient of the first camera, and we obtain The dedistorted image is then processed based on the extended internal parameters to obtain a sample image.
步骤5023,使用目标检测模型对样本图像进行检测,得到目标对象在样本图像坐标系下的第一二维位置信息和第一深度信息。本申请实施例中第一二维位置信息例如是目标对象的各关键点在样本图像中的位置坐标,如前文所述的2D坐标(u,v)。第一深度信息即2D坐标(u,v)对于的深度,如公式(1)中的Z。Step 5023: Use the target detection model to detect the sample image, and obtain the first two-dimensional position information and the first depth information of the target object in the sample image coordinate system. In the embodiment of the present application, the first two-dimensional position information is, for example, the position coordinates of each key point of the target object in the sample image, such as the 2D coordinates (u, v) mentioned above. The first depth information is the depth for which the 2D coordinates (u, v) correspond, such as Z in formula (1).
步骤5024,根据扩展内参对第一二维位置信息和第一深度信息进行坐标变换,得到目标对象在扩展内参对应的相机坐标系下的第一三维位置信息。第一三维位置信息为通过公式(1)计算得到的目标对象的关键点的3D坐标。Step 5024: Perform coordinate transformation on the first two-dimensional position information and the first depth information according to the extended internal parameters to obtain the first three-dimensional position information of the target object in the camera coordinate system corresponding to the extended internal parameters. The first three-dimensional position information is the 3D coordinates of the key points of the target object calculated through formula (1).
步骤5025,根据第一三维位置信息与标注位置信息的差异,对目标检测模型进行参数调整。Step 5025: Adjust parameters of the target detection model based on the difference between the first three-dimensional position information and the annotated position information.
目标检测模型中包含多个函数,每个函数具有自己的参数,从而定义了模型的功能。训练的目的就是根据训练的数据集估计和调整这些函数的参数,让模型学会从输入图像到期望结果的映射。There are multiple functions included in the object detection model, each function has its own parameters, thus defining the functionality of the model. The purpose of training is to estimate and adjust the parameters of these functions based on the training data set, so that the model can learn the mapping from input images to expected results.
以自动驾驶车辆为例,针对扩展内参的整体训练流程可概括为如图6所示:采集图像之后,需要对图像中障碍物的关键点的3D坐标进行标注得到目标对象的标注位置,标注位置保存在标注文件中,故此在标注文件中含有障碍物的关键点的真实3D坐标(x,y,z)(即目标对象的标注位置),训练时:Taking autonomous vehicles as an example, the overall training process for extended internal parameters can be summarized as shown in Figure 6: After collecting the image, it is necessary to annotate the 3D coordinates of the key points of the obstacles in the image to obtain the annotated position of the target object. The annotated position It is saved in the annotation file, so the annotation file contains the real 3D coordinates (x, y, z) of the key points of the obstacle (that is, the annotation position of the target object). During training:
首先在步骤601中,获取单目相机采集的图像A、内参K以及畸变系数D。First, in step 601, the image A, internal parameter K and distortion coefficient D collected by the monocular camera are obtained.
在步骤602中,将内参K变换为内参K′。In step 602, the internal parameter K is transformed into the internal parameter K'.
在步骤603中,对图像A采用内参K以及畸变系数D进行去畸变得到图像A′,然后在步骤604中采用扩展内参K′对图像A′进行几何变换得到图像B。In step 603, image A is dedistorted using intrinsic parameter K and distortion coefficient D to obtain image A', and then in step 604, image A' is geometrically transformed using extended intrinsic parameter K' to obtain image B.
需要说明的是步骤602和步骤603的执行顺序不受限。It should be noted that the execution order of step 602 and step 603 is not limited.
在步骤605中,将图像B输入给目标检测模型,输出障碍物的关键点的2D坐标及其深度(u p,v p,Z p)。 In step 605, image B is input to the target detection model, and the 2D coordinates of the key points of the obstacle and their depths ( up , v p , Z p ) are output.
在步骤606中,采用扩展内参K′的逆矩阵K′ -1左乘(u p,v p,Z p)得到预测的3D坐标。 In step 606, the predicted 3D coordinates are obtained by left-multiplying ( up , v p , Z p ) by the inverse matrix K′ -1 of the extended internal parameter K′.
在步骤607中,确定预测的3D坐标和真实3D坐标(x,y,z)之间的差异,据此差异可调整目标检测模型的参数。In step 607, the difference between the predicted 3D coordinates and the real 3D coordinates (x, y, z) is determined, and parameters of the target detection model can be adjusted based on the difference.
其中,在训练时,同一训练批次所采用的训练样本属于同一扩展相机。每个训练批次包括多个训练样本,计算同一训练批次中所有训练样本的预测的3D坐标和真实3D坐标的总差异,来调整一次目标检测模型的参数。Among them, during training, the training samples used in the same training batch belong to the same extended camera. Each training batch includes multiple training samples, and the total difference between the predicted 3D coordinates and the real 3D coordinates of all training samples in the same training batch is calculated to adjust the parameters of the target detection model.
除了采用扩展相机训练目标检测模型之外,本申请实施例中,前述的N次训练中还包括至少一次第二,第二训练会采用第一相机的内参训练目标检测模型,可实施为:In addition to using the extended camera to train the target detection model, in the embodiment of the present application, the aforementioned N times of training also include at least one second training. The second training will use the internal parameters of the first camera to train the target detection model, which can be implemented as:
使用目标检测模型对第一相机采集的图像进行检测,得到目标对象在第一相机采集的图像的图像坐标系下的第二二维位置信息和第二深度信息;Use the target detection model to detect the image collected by the first camera, and obtain the second two-dimensional position information and the second depth information of the target object in the image coordinate system of the image collected by the first camera;
根据第一相机的内参对第二二维位置信息和第二深度信息进行坐标变换,得到目标对象在第一相机坐标系下的第二三维位置信息;Perform coordinate transformation on the second two-dimensional position information and the second depth information according to the internal parameters of the first camera to obtain the second three-dimensional position information of the target object in the coordinate system of the first camera;
根据第二三维位置信息与标注位置信息的差异,对目标检测模型进行参数调整。According to the difference between the second three-dimensional position information and the annotated position information, the parameters of the target detection model are adjusted.
由此,保证目标检测模型是基于相机集合中的真实相机进行训练的,目标检测模型可很好的适用该相机集合中的任一相机。This ensures that the target detection model is trained based on real cameras in the camera set, and the target detection model can be well applied to any camera in the camera set.
基于相同的发明构思,本申请实施例还提供应用上述目标检测模型的进行目标检测的方法,如图7所示,包括以下步骤:Based on the same inventive concept, embodiments of the present application also provide a method for target detection using the above target detection model, as shown in Figure 7, including the following steps:
步骤701,获取单目相机采集的待检测图像以及单目相机的内参。Step 701: Obtain the image to be detected collected by the monocular camera and the internal parameters of the monocular camera.
步骤702,对待检测图像进行检测,得到目标对象在待检测图像坐标系中的二维位置信息和深度信息。Step 702: Detect the image to be detected and obtain the two-dimensional position information and depth information of the target object in the coordinate system of the image to be detected.
步骤703,根据单目相机的内参对二维位置信息和深度信息进行坐标变换,得到目标对象在单目相机坐标系下的三维位置信息。Step 703: Perform coordinate transformation on the two-dimensional position information and depth information according to the internal parameters of the monocular camera to obtain the three-dimensional position information of the target object in the monocular camera coordinate system.
以自动驾驶车辆为例,相机集合中包括多个单目相机的内参C,采用这多个单目相机的内参C可扩展出多个扩展内参E。则训练阶段训练所使用的内参集合F中包括内参C和内参E。在将目标检测模型部署在实车上时,首先标定所用单目相机的内参K”和畸变系数D。由于在模型训练中使用了可变的内参,推理时使用的真实内参K”只是模型训练中使用过的内参集合F的子集。Taking an autonomous vehicle as an example, the camera set includes internal parameters C of multiple monocular cameras. Multiple extended internal parameters E can be expanded by using the internal parameters C of multiple monocular cameras. Then the internal parameter set F used in the training phase includes internal parameter C and internal parameter E. When deploying the target detection model on a real vehicle, first calibrate the internal parameters K" and distortion coefficient D of the monocular camera used. Since variable internal parameters are used in model training, the real internal parameters K" used in inference are only model training A subset of the internal parameter set F used in .
使用参数K”和D来对采集到的图像A去畸变,得到图像B,将图像B送入目标检测模型进行推理,得到障碍物关键点的2D坐标和深度(u,v,Z’),再用K”的逆矩阵左乘以(u,v,Z’)来获得障碍物关键点的3D坐标(X’,Y’,Z’)。预测出多个障碍物关键点的3D坐标后,根据几何关系推理出障碍物的完整3D框信息。Use parameters K" and D to dedistort the collected image A to obtain image B. Send image B to the target detection model for inference to obtain the 2D coordinates and depth (u, v, Z') of the key points of the obstacle. Then use the inverse matrix of K" to multiply left by (u, v, Z') to obtain the 3D coordinates (X', Y', Z') of the key points of the obstacle. After predicting the 3D coordinates of multiple obstacle key points, the complete 3D box information of the obstacle is inferred based on geometric relationships.
综上所述,多个不同内参的相机,皆因其内参在模型训练时被使用过,模型适应了其内参,则在推理时就可以使用该内参的逆矩阵去左乘(u,v,Z’),并得到正确的3D坐标。To sum up, multiple cameras with different internal parameters have been used during model training. If the model adapts to its internal parameters, then the inverse matrix of the internal parameters can be used to left multiply (u, v, Z') and get the correct 3D coordinates.
如图8所示,同一相机类型中包括四个单目相机,假设依次为相机1、相机2、相机3和相机4。则这四个相机共用同一目标检测模型。当相机1采集图像输入目标检测模型时,采用相机1的内参K1和畸变系数D1对其采集的图像进行去畸变,然后输入目标检测模型,目标检测模型获取相机1的内参K1,并采用内参K1的逆矩阵得到障碍物的3D坐标。同理,对相机2采集到的图像,采用相机2的内参K2和畸变系数D2对其采集的图像进行去畸变,然后输入目标检测模型,目标检测模型获取相机2的内参K2,并采用内参K2的逆矩阵得到障碍物的3D坐标。相机3和相机4的处理方式类似,这里不再赘述。As shown in Figure 8, the same camera type includes four monocular cameras, assuming that they are camera 1, camera 2, camera 3 and camera 4 in order. Then these four cameras share the same target detection model. When camera 1 collects images and inputs them into the target detection model, the internal parameter K1 and distortion coefficient D1 of camera 1 are used to dedistort the collected images, and then the target detection model is input. The target detection model obtains the internal parameter K1 of camera 1 and uses the internal parameter K1. The inverse matrix of gets the 3D coordinates of the obstacle. In the same way, for the image collected by camera 2, the internal parameter K2 and distortion coefficient D2 of camera 2 are used to dedistort the image collected, and then input into the target detection model. The target detection model obtains the internal parameter K2 of camera 2, and uses the internal parameter K2 The inverse matrix of gets the 3D coordinates of the obstacle. The processing methods of Camera 3 and Camera 4 are similar, so I won’t go into details here.
综上所述,本申请中发布一个目标检测模型,适用于某一款甚至多款相机。由于制造工艺的原因,即使是同品牌同款型的相机,其内参都会存在差异。若为每个相机都训练相应的模型,显然开销太大。采用本申请的方案后,可以训练一个模型,使其适用于某一个款型甚至几个款型的相机。To sum up, this application releases a target detection model that is suitable for one or even multiple cameras. Due to the manufacturing process, even cameras of the same brand and model will have differences in their internal parameters. If you train a corresponding model for each camera, it will obviously be too expensive. After adopting the solution of this application, a model can be trained to make it suitable for a certain model or even several models of cameras.
基于相同的发明构思,本申请还提供一种目标检测模型的训练装置900,如图9所示,该训练装置900包括:Based on the same inventive concept, this application also provides a training device 900 for a target detection model. As shown in Figure 9, the training device 900 includes:
信息获取模块901,用于获取至少一个单目相机的内参以及所述至少一个单目相机采集的图像;Information acquisition module 901, used to acquire internal parameters of at least one monocular camera and images collected by the at least one monocular camera;
训练模块902,用于根据第一相机的内参和所述第一相机采集的图像,对目标检测模型进行N次训练,所述第一相机为所述至少一个单目相机中的任意一个,所述N为大于1的整数;The training module 902 is configured to train the target detection model N times according to the internal parameters of the first camera and the images collected by the first camera, where the first camera is any one of the at least one monocular camera, so N is an integer greater than 1;
其中,所述N次训练中包括至少一次第一训练,所述第一训练包括以下步骤:Wherein, the N times of training include at least one first training, and the first training includes the following steps:
对所述第一相机的内参进行变换,得到所述第一训练使用的扩展内参;Transform the internal parameters of the first camera to obtain extended internal parameters used in the first training;
根据所述第一相机的内参和所述扩展内参对所述第一相机采集的图像进行几何变换,得到所述第一训练使用的样本图像,将目标对象在所述第一相机采集的图像中的三维位置信息作为所述目标对象在所述样本图像中的标注位置信息;Perform geometric transformation on the image collected by the first camera according to the internal parameters of the first camera and the extended internal parameter to obtain the sample image used for the first training, and place the target object in the image collected by the first camera. The three-dimensional position information is used as the annotated position information of the target object in the sample image;
使用所述目标检测模型对所述样本图像进行检测,得到所述目标对象在所述样本图像坐标系下的第一二维位置信息和第一深度信息;Use the target detection model to detect the sample image to obtain the first two-dimensional position information and the first depth information of the target object in the sample image coordinate system;
根据所述扩展内参对所述第一二维位置信息和所述第一深度信息进行坐标变换,得到所述目标对象在所述扩展内参对应的相机坐标系下的第一三维位置信息;Perform coordinate transformation on the first two-dimensional position information and the first depth information according to the extended internal parameters to obtain the first three-dimensional position information of the target object in the camera coordinate system corresponding to the extended internal parameters;
根据所述第一三维位置信息与所述标注位置信息的差异,对所述目标检测模型进行参数调整。According to the difference between the first three-dimensional position information and the marked position information, parameters of the target detection model are adjusted.
在一些实施方式中,所述训练模块,还用于在所述N次训练中执行至少一次第二训练,所述第二训练包括以下步骤:In some embodiments, the training module is also used to perform at least one second training in the N trainings, and the second training includes the following steps:
使用所述目标检测模型对所述第一相机采集的图像进行检测,得到所述目标对象在所述第一相机采集的图像的图像坐标系下的第二二维位置信息和第二深度信息;Use the target detection model to detect the image collected by the first camera to obtain the second two-dimensional position information and second depth information of the target object in the image coordinate system of the image collected by the first camera;
根据所述第一相机的内参对所述第二二维位置信息和所述第二深度信息进行坐标变换,得到所述目标对象在所述第一相机坐标系下的第二三维位置信息;Perform coordinate transformation on the second two-dimensional position information and the second depth information according to the internal parameters of the first camera to obtain the second three-dimensional position information of the target object in the first camera coordinate system;
根据所述第二三维位置信息与所述标注位置信息的差异,对所述目标检测模型进行参数调整。According to the difference between the second three-dimensional position information and the marked position information, parameters of the target detection model are adjusted.
在一些实施方式中,若所述N次训练中的S次第一训练对所述第一相机的内参进行了变换,则所述S次第一训练中每次训练时变换得到的扩展内参均不同,S为小于或等于N的整数。In some embodiments, if the S first trainings among the N trainings transform the internal parameters of the first camera, then the expanded internal parameters obtained by the transformation during each training of the S first trainings are Different, S is an integer less than or equal to N.
在一些实施方式中,所述训练模块,用于对所述第一相机的内参进行随机扰动,得到所述第一训练使用的扩展内参。In some implementations, the training module is used to perform random perturbations on the internal parameters of the first camera to obtain expanded internal parameters used in the first training.
在一些实施方式中,执行所述对所述第一相机的内参进行随机扰动,所述训练模块具体用于:In some implementations, the random perturbation of the internal parameters of the first camera is performed, and the training module is specifically used to:
针对所述第一相机的内参中的子参数,以所述子参数为中心构建正态分布曲线;For the sub-parameters in the internal parameters of the first camera, construct a normal distribution curve centered on the sub-parameters;
在以所述子参数为中心的指定范围内、从所述正态分布曲线上获取一个点,将获取的点作为所述子参数的扩展子参数;Obtain a point from the normal distribution curve within a specified range centered on the sub-parameter, and use the obtained point as an extended sub-parameter of the sub-parameter;
用所述子参数的扩展子参数替换所述第一相机的内参中的所述子参数。The sub-parameters in the intrinsic parameters of the first camera are replaced with extended sub-parameters of the sub-parameters.
在一些实施方式中,所述训练模块用于:In some embodiments, the training module is used to:
基于所述第一相机的内参和所述第一相机的畸变系数对所述第一相机采集的图像进行去畸变,得到去畸变后的图像;De-distort the image collected by the first camera based on the internal parameters of the first camera and the distortion coefficient of the first camera to obtain a de-distorted image;
基于所述扩展内参对所述去畸变后的图像进行处理,得到所述样本图像。The dedistorted image is processed based on the extended internal parameters to obtain the sample image.
基于相同的发明构思,本申请还提供一种目标检测装置1000,应用于训练装置900得到的目标检测模型对目标对象进行检测的过程,如图10所述,所述目标检测装置1000包括:Based on the same inventive concept, this application also provides a target detection device 1000, which is used in the process of detecting target objects using the target detection model obtained by the training device 900. As shown in Figure 10, the target detection device 1000 includes:
待检测图像获取模块1001,用于获取单目相机采集的待检测图像以及所述单目相机的内参;The image to be detected acquisition module 1001 is used to acquire the image to be detected collected by a monocular camera and the internal parameters of the monocular camera;
二维信息获取模块1002,用于对所述待检测图像进行检测,得到目标对象在所述待检测图像坐标系中的二维位置信息和深度信息;The two-dimensional information acquisition module 1002 is used to detect the image to be detected and obtain the two-dimensional position information and depth information of the target object in the coordinate system of the image to be detected;
三维信息确定模块1003,用于根据所述单目相机的内参对所述二维位置信息和所述深度信息进行坐标变换,得到所述目标对象在所述单目相机坐标系下的三维位置信息。The three-dimensional information determination module 1003 is used to perform coordinate transformation on the two-dimensional position information and the depth information according to the internal parameters of the monocular camera to obtain the three-dimensional position information of the target object in the monocular camera coordinate system. .
基于相同的发明构思,本申请提供一种芯片系统,包括:存储器,用于存储计算机程序;处理器;当处理器从存储器中调用并运行计算机程序后,使得安装有该芯片系统的电子设备执行本申请任一所述目标检测模型的训练方法或目标检测方法。Based on the same inventive concept, this application provides a chip system, including: a memory for storing a computer program; a processor; when the processor calls and runs the computer program from the memory, it causes the electronic device installed with the chip system to execute The training method or the target detection method of any of the target detection models described in this application.
基于相同的发明构思,本申请提供一种包含指令的计算机程序产品,当其在计算机上运行时,使得所述计算机执行本申请任一所述的目标检测模型的训练方法或目标检测方法。Based on the same inventive concept, this application provides a computer program product containing instructions that, when run on a computer, causes the computer to execute any of the target detection model training methods or target detection methods described in this application.
基于相同的发明构思,本申请提供一种计算机可读存储介质,包括指令,当所述指令在计算机上运行时,使得计算机执行本申请任一所述的目标检测模型的训练方法或目标检测方法。Based on the same inventive concept, this application provides a computer-readable storage medium, including instructions. When the instructions are run on a computer, the computer executes any of the training methods or target detection methods of the target detection model described in this application. .
基于相同的发明构思,本申请实施例还提供一种电子设备,该电子设备可以具有如图11所示的结构,该电子设备可以是计算机设备,也可以是能够支持计算机设备实现上述方法的芯片或芯片系统。Based on the same inventive concept, embodiments of the present application also provide an electronic device. The electronic device may have a structure as shown in Figure 11. The electronic device may be a computer device or a chip that can support the computer device to implement the above method. or system-on-a-chip.
如图11所示的电子设备1100可以包括至少一个处理器1101,所述至少一个处理器1101用于与存储器耦合,读取并执行所述存储器中的指令以实现本申请实施例目标检测模型的训练方法或目标检测方法的步骤。可选的,该电子设备还可以包括通信接口1102,用于支持该电子设备进行信令或者数据的接收或发送。电子设备中的通信接口1102,可用于实现与其他电子设备的进行交互。处理器1101可用于实现电子设备执行如图5-7中任一所示的方法中的步骤。可选的,该电子设备通还可以包括存储器1103,其中存储有计算机指令,存储器1103可以与处理器1101和/或通信接口1102耦合,用于支持处理器1001调用存储器1103中的计算机指令以实现如图5-7中任一所示的方法中的步骤;另外,存储器 1103还可以用于存储本申请方法实施例所涉及的数据,例如,用于存储支持通信接口1002实现交互所必须的数据、指令,和/或,用于存储电子设备执行本申请实施例所述方法所必须的配置信息例如WORM属性。The electronic device 1100 shown in Figure 11 may include at least one processor 1101, which is configured to be coupled with a memory, read and execute instructions in the memory to implement the target detection model of the embodiment of the present application. The steps of a training method or an object detection method. Optionally, the electronic device may also include a communication interface 1102 for supporting the electronic device to receive or send signaling or data. The communication interface 1102 in the electronic device can be used to interact with other electronic devices. The processor 1101 may be used to implement the electronic device to perform the steps in the method shown in any one of Figures 5-7. Optionally, the electronic device may also include a memory 1103 in which computer instructions are stored. The memory 1103 may be coupled with the processor 1101 and/or the communication interface 1102 to support the processor 1001 in calling the computer instructions in the memory 1103 to implement Steps in the method shown in any one of Figures 5-7; In addition, the memory 1103 can also be used to store data involved in the method embodiments of the present application, for example, used to store data necessary to support the communication interface 1002 to implement interaction. , instructions, and/or, used to store configuration information such as WORM attributes necessary for the electronic device to execute the method described in the embodiments of this application.
本申请实施例还提供了一种计算机可读存储介质,计算机可读存储介质上存储有计算机指令,这些计算机指令被计算机调用执行时,可以使得计算机完成上述方法实施例、方法实施例的任意一种可能的设计中所涉及的方法。本申请实施例中,对计算机可读存储介质不做限定,例如,可以是RAM(random-access memory,随机存取存储器)、ROM(read-only memory,只读存储器)等。Embodiments of the present application also provide a computer-readable storage medium. Computer instructions are stored on the computer-readable storage medium. When these computer instructions are called and executed by a computer, they can cause the computer to complete any one of the above method embodiments and method embodiments. methods involved in possible designs. In the embodiments of this application, the computer-readable storage medium is not limited. For example, it may be RAM (random-access memory), ROM (read-only memory), etc.
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机指令的形式实现。在计算机上加载和执行所述计算机指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤)或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据电子设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘(Solid State Disk,SSD))等。In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented using software, it may be implemented in whole or in part in the form of computer instructions. When the computer instructions are loaded and executed on the computer, the processes or functions described in the embodiments of the present application are generated in whole or in part. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, e.g., the computer instructions may be transferred from a website, computer, server, or data center Transmission to another website, computer, server or data center through wired (such as coaxial cable, optical fiber) or wireless (such as infrared, wireless, microwave, etc.) means. The computer-readable storage medium may be any available medium that can be accessed by a computer or a data electronic device such as a server or data center integrated with one or more available media. The available media may be magnetic media (eg, floppy disk, hard disk, tape), optical media (eg, DVD), or semiconductor media (eg, Solid State Disk (SSD)), etc.
本申请实施例中所描述的方法或算法的步骤可以直接嵌入硬件、处理器执行的软件单元、或者这两者的结合。软件单元可以存储于RAM存储器、闪存、ROM存储器、EPROM存储器、EEPROM存储器、寄存器、硬盘、可移动磁盘、CD-ROM或本领域中其它任意形式的存储媒介中。示例性地,存储媒介可以与处理器连接,以使得处理器可以从存储媒介中读取信息,并可以向存储媒介存写信息。可选地,存储媒介还可以集成到处理器中。处理器和存储媒介可以设置于ASIC中,ASIC可以设置于终端设备中。可选地,处理器和存储媒介也可以设置于终端设备中的不同的部件中。The steps of the method or algorithm described in the embodiments of this application can be directly embedded in hardware, a software unit executed by a processor, or a combination of the two. The software unit may be stored in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, register, hard disk, removable disk, CD-ROM or any other form of storage medium in the art. For example, the storage medium can be connected to the processor, so that the processor can read information from the storage medium and can store and write information to the storage medium. Optionally, the storage medium can also be integrated into the processor. The processor and the storage medium can be installed in the ASIC, and the ASIC can be installed in the terminal device. Optionally, the processor and the storage medium may also be provided in different components in the terminal device.
这些计算机指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer instructions may also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on the computer or other programmable device to produce computer-implemented processes, thereby causing the instructions to execute on the computer or other programmable device Provides steps for implementing the functionality specified in a process or processes in a flow diagram and/or in a block or blocks in a block diagram.
尽管结合具体特征及其实施例对本申请进行了描述,显而易见的,在不脱离本申请的范围的情况下,可对其进行各种修改和组合。相应地,本说明书和附图仅仅是所附权利要求所界定的本申请的示例性说明,且视为已覆盖本申请范围内的任意和所有修改、变化、组合或等同物。显然,本领域的技术人员可以对本申请进行各种改动和变型而不脱离本申请的精神和范围。这样,倘若本申请的这些修改和变型属于本申请权利要求及其等同技术的范围之内,则本申请也意图包含这些改动和变型在内。Although the present application has been described in conjunction with specific features and embodiments thereof, it will be apparent that various modifications and combinations may be made without departing from the scope of the present application. Accordingly, the specification and drawings are intended to be merely illustrative of the application as defined by the appended claims and are to be construed to cover any and all modifications, variations, combinations or equivalents within the scope of the application. Obviously, those skilled in the art can make various changes and modifications to the present application without departing from the spirit and scope of the present application. In this way, if these modifications and variations of the present application fall within the scope of the claims of the present application and equivalent technologies, the present application is also intended to include these modifications and variations.

Claims (18)

  1. 一种目标检测模型的训练方法,其特征在于,包括:A training method for a target detection model, which is characterized by including:
    获取至少一个单目相机的内参以及所述至少一个单目相机采集的图像;Obtaining internal parameters of at least one monocular camera and images collected by the at least one monocular camera;
    根据第一相机的内参和所述第一相机采集的图像,对目标检测模型进行N次训练,所述第一相机为所述至少一个单目相机中的任意一个,所述N为大于1的整数;According to the internal parameters of the first camera and the images collected by the first camera, the target detection model is trained N times, the first camera is any one of the at least one monocular camera, and the N is greater than 1 integer;
    其中,所述N次训练中包括至少一次第一训练,所述第一训练包括以下步骤:Wherein, the N times of training include at least one first training, and the first training includes the following steps:
    对所述第一相机的内参进行变换,得到所述第一训练使用的扩展内参;Transform the internal parameters of the first camera to obtain extended internal parameters used in the first training;
    根据所述第一相机的内参和所述扩展内参对所述第一相机采集的图像进行几何变换,得到所述第一训练使用的样本图像,将目标对象在所述第一相机采集的图像中的三维位置信息作为所述目标对象在所述样本图像中的标注位置信息;Perform geometric transformation on the image collected by the first camera according to the internal parameters of the first camera and the extended internal parameter to obtain the sample image used for the first training, and place the target object in the image collected by the first camera. The three-dimensional position information is used as the annotated position information of the target object in the sample image;
    使用所述目标检测模型对所述样本图像进行检测,得到所述目标对象在所述样本图像坐标系下的第一二维位置信息和第一深度信息;Use the target detection model to detect the sample image to obtain the first two-dimensional position information and the first depth information of the target object in the sample image coordinate system;
    根据所述扩展内参对所述第一二维位置信息和所述第一深度信息进行坐标变换,得到所述目标对象在所述扩展内参对应的相机坐标系下的第一三维位置信息;Perform coordinate transformation on the first two-dimensional position information and the first depth information according to the extended internal parameters to obtain the first three-dimensional position information of the target object in the camera coordinate system corresponding to the extended internal parameters;
    根据所述第一三维位置信息与所述标注位置信息的差异,对所述目标检测模型进行参数调整。According to the difference between the first three-dimensional position information and the marked position information, parameters of the target detection model are adjusted.
  2. 根据权利要求1所述的方法,其特征在于,所述N次训练中包括至少一次第二训练,所述第二训练包括以下步骤:The method according to claim 1, characterized in that the N times of training include at least one second training, and the second training includes the following steps:
    使用所述目标检测模型对所述第一相机采集的图像进行检测,得到所述目标对象在所述第一相机采集的图像的图像坐标系下的第二二维位置信息和第二深度信息;Use the target detection model to detect the image collected by the first camera to obtain the second two-dimensional position information and second depth information of the target object in the image coordinate system of the image collected by the first camera;
    根据所述第一相机的内参对所述第二二维位置信息和所述第二深度信息进行坐标变换,得到所述目标对象在所述第一相机坐标系下的第二三维位置信息;Perform coordinate transformation on the second two-dimensional position information and the second depth information according to the internal parameters of the first camera to obtain the second three-dimensional position information of the target object in the first camera coordinate system;
    根据所述第二三维位置信息与所述标注位置信息的差异,对所述目标检测模型进行参数调整。According to the difference between the second three-dimensional position information and the marked position information, parameters of the target detection model are adjusted.
  3. 根据权利要求1或2所述的方法,其特征在于,若所述N次训练中的S次第一训练对所述第一相机的内参进行了变换,则所述S次第一训练中每次训练时变换得到的扩展内参均不同,S为小于或等于N的整数。The method according to claim 1 or 2, characterized in that if the S times of first training among the N times of training transform the internal parameters of the first camera, then each of the S times of first training The expanded internal parameters obtained by transformation during training are different, and S is an integer less than or equal to N.
  4. 根据权利要求1-3中任一所述的方法,其特征在于,所述对所述第一相机的内参进行变换,得到所述第一训练使用的扩展内参,包括:The method according to any one of claims 1-3, characterized in that said transforming the internal parameters of the first camera to obtain the extended internal parameters used in the first training includes:
    对所述第一相机的内参进行随机扰动,得到所述第一训练使用的扩展内参。Randomly perturb the internal parameters of the first camera to obtain the expanded internal parameters used in the first training.
  5. 根据权利要求4所述的方法,其特征在于,所述对所述第一相机的内参进行随机扰动,包括:The method according to claim 4, wherein the random perturbation of the internal parameters of the first camera includes:
    针对所述第一相机的内参中的子参数,以所述子参数为中心构建正态分布曲线;For the sub-parameters in the internal parameters of the first camera, construct a normal distribution curve centered on the sub-parameters;
    在以所述子参数为中心的指定范围内、从所述正态分布曲线上获取一个点,将获取的点作为所述子参数的扩展子参数;Obtain a point from the normal distribution curve within a specified range centered on the sub-parameter, and use the obtained point as an extended sub-parameter of the sub-parameter;
    用所述子参数的扩展子参数替换所述第一相机的内参中的所述子参数。The sub-parameters in the intrinsic parameters of the first camera are replaced with extended sub-parameters of the sub-parameters.
  6. 根据权利要求1-5中任一项所述的方法,其特征在于,所述根据所述第一相机的内参和所述扩展内参对所述第一相机采集的图像进行几何变换,得到所述第一训练使用的样本图像,包括:The method according to any one of claims 1 to 5, characterized in that the image collected by the first camera is geometrically transformed according to the internal parameters of the first camera and the extended internal parameters to obtain the The first training uses sample images, including:
    基于所述第一相机的内参和所述第一相机的畸变系数对所述第一相机采集的图像进行去畸变,得到去畸变后的图像;De-distort the image collected by the first camera based on the internal parameters of the first camera and the distortion coefficient of the first camera to obtain a de-distorted image;
    基于所述扩展内参对所述去畸变后的图像进行处理,得到所述样本图像。The dedistorted image is processed based on the extended internal parameters to obtain the sample image.
  7. 一种目标检测方法,其特征在于,应用于如权利要求1-6中任一项所述的方法训练得到的目标检测模型对目标对象进行检测的过程,所述方法包括:A target detection method, characterized in that it is applied to the process of detecting target objects using a target detection model trained by the method according to any one of claims 1 to 6, and the method includes:
    获取单目相机采集的待检测图像以及所述单目相机的内参;Obtain the image to be detected collected by the monocular camera and the internal parameters of the monocular camera;
    对所述待检测图像进行检测,得到目标对象在所述待检测图像坐标系中的二维位置信息和深度信息;Detect the image to be detected and obtain the two-dimensional position information and depth information of the target object in the coordinate system of the image to be detected;
    根据所述单目相机的内参对所述二维位置信息和所述深度信息进行坐标变换,得到所述目标对象在所述单目相机坐标系下的三维位置信息。Coordinate transformation is performed on the two-dimensional position information and the depth information according to the internal parameters of the monocular camera to obtain the three-dimensional position information of the target object in the coordinate system of the monocular camera.
  8. 一种目标检测模型的训练装置,其特征在于,包括:A training device for a target detection model, which is characterized by including:
    信息获取模块,用于获取至少一个单目相机的内参以及所述至少一个单目相机采集的图像;An information acquisition module, configured to acquire internal parameters of at least one monocular camera and images collected by the at least one monocular camera;
    训练模块,用于根据第一相机的内参和所述第一相机采集的图像,对目标检测模型进行N次训练,所述第一相机为所述至少一个单目相机中的任意一个,所述N为大于1的整数;A training module configured to train the target detection model N times based on the internal parameters of the first camera and the images collected by the first camera, where the first camera is any one of the at least one monocular camera, and the N is an integer greater than 1;
    其中,所述N次训练中包括至少一次第一训练,所述第一训练包括以下步骤:Wherein, the N times of training include at least one first training, and the first training includes the following steps:
    对所述第一相机的内参进行变换,得到所述第一训练使用的扩展内参;Transform the internal parameters of the first camera to obtain extended internal parameters used in the first training;
    根据所述第一相机的内参和所述扩展内参对所述第一相机采集的图像进行几何变换,得到所述第一训练使用的样本图像,将目标对象在所述第一相机采集的图像中的三维位置信息作为所述目标对象在所述样本图像中的标注位置信息;Perform geometric transformation on the image collected by the first camera according to the internal parameters of the first camera and the extended internal parameter to obtain the sample image used for the first training, and place the target object in the image collected by the first camera. The three-dimensional position information is used as the annotated position information of the target object in the sample image;
    使用所述目标检测模型对所述样本图像进行检测,得到所述目标对象在所述样本图像坐标系下的第一二维位置信息和第一深度信息;Use the target detection model to detect the sample image to obtain the first two-dimensional position information and the first depth information of the target object in the sample image coordinate system;
    根据所述扩展内参对所述第一二维位置信息和所述第一深度信息进行坐标变换,得到所述目标对象在所述扩展内参对应的相机坐标系下的第一三维位置信息;Perform coordinate transformation on the first two-dimensional position information and the first depth information according to the extended internal parameters to obtain the first three-dimensional position information of the target object in the camera coordinate system corresponding to the extended internal parameters;
    根据所述第一三维位置信息与所述标注位置信息的差异,对所述目标检测模型进行参数调整。According to the difference between the first three-dimensional position information and the marked position information, parameters of the target detection model are adjusted.
  9. 根据权利要求8所述的装置,其特征在于,所述训练模块,还用于在所述N次训练中执行至少一次第二训练,所述第二训练包括以下步骤:The device according to claim 8, wherein the training module is further configured to perform at least one second training in the N trainings, and the second training includes the following steps:
    使用所述目标检测模型对所述第一相机采集的图像进行检测,得到所述目标对象在所述第一相机采集的图像的图像坐标系下的第二二维位置信息和第二深度信息;Use the target detection model to detect the image collected by the first camera to obtain the second two-dimensional position information and second depth information of the target object in the image coordinate system of the image collected by the first camera;
    根据所述第一相机的内参对所述第二二维位置信息和所述第二深度信息进行坐标变换,得到所述目标对象在所述第一相机坐标系下的第二三维位置信息;Perform coordinate transformation on the second two-dimensional position information and the second depth information according to the internal parameters of the first camera to obtain the second three-dimensional position information of the target object in the first camera coordinate system;
    根据所述第二三维位置信息与所述标注位置信息的差异,对所述目标检测模型进行参数调整。According to the difference between the second three-dimensional position information and the marked position information, parameters of the target detection model are adjusted.
  10. 根据权利要求8或9所述的装置,其特征在于,若所述N次训练中的S次第一训练对所述第一相机的内参进行了变换,则所述S次第一训练中每次训练时变换得到的扩展内参均不同,S为小于或等于N的整数。The device according to claim 8 or 9, characterized in that if the S times of first training among the N times of training transform the internal parameters of the first camera, then each of the S times of first training The expanded internal parameters obtained by transformation during training are different, and S is an integer less than or equal to N.
  11. 根据权利要求8-10中任一所述的装置,其特征在于,所述训练模块,用于对所述第一相机的内参进行随机扰动,得到所述第一训练使用的扩展内参。The device according to any one of claims 8-10, characterized in that the training module is used to randomly perturb the internal parameters of the first camera to obtain the expanded internal parameters used in the first training.
  12. 根据权利要求11所述的装置,其特征在于,执行所述对所述第一相机的内参进行随机扰动,所述训练模块具体用于:The device according to claim 11, characterized in that, performing the random perturbation of the internal parameters of the first camera, the training module is specifically used to:
    针对所述第一相机的内参中的子参数,以所述子参数为中心构建正态分布曲线;For the sub-parameters in the internal parameters of the first camera, construct a normal distribution curve centered on the sub-parameters;
    在以所述子参数为中心的指定范围内、从所述正态分布曲线上获取一个点,将获取的点作为所述子参数的扩展子参数;Obtain a point from the normal distribution curve within a specified range centered on the sub-parameter, and use the obtained point as an extended sub-parameter of the sub-parameter;
    用所述子参数的扩展子参数替换所述第一相机的内参中的所述子参数。The sub-parameters in the intrinsic parameters of the first camera are replaced with extended sub-parameters of the sub-parameters.
  13. 根据权利要求8-12中任一项所述的装置,其特征在于,所述训练模块用于:The device according to any one of claims 8-12, characterized in that the training module is used for:
    基于所述第一相机的内参和所述第一相机的畸变系数对所述第一相机采集的图像进行去畸变,得到去畸变后的图像;De-distort the image collected by the first camera based on the internal parameters of the first camera and the distortion coefficient of the first camera to obtain a de-distorted image;
    基于所述扩展内参对所述去畸变后的图像进行处理,得到所述样本图像。The dedistorted image is processed based on the extended internal parameters to obtain the sample image.
  14. 一种目标检测装置,其特征在于,应用于如权利要求8-13中任一项所述的装置得到的目标检测模型对目标对象进行检测的过程,所述装置包括:A target detection device, characterized in that the target detection model obtained by the device according to any one of claims 8-13 is applied to the process of detecting a target object, and the device includes:
    待检测图像获取模块,用于获取单目相机采集的待检测图像以及所述单目相机的内参;The image acquisition module to be detected is used to acquire the image to be detected collected by the monocular camera and the internal parameters of the monocular camera;
    二维信息获取模块,用于对所述待检测图像进行检测,得到目标对象在所述待检测图像坐标系中的二维位置信息和深度信息;A two-dimensional information acquisition module, used to detect the image to be detected and obtain the two-dimensional position information and depth information of the target object in the coordinate system of the image to be detected;
    三维信息确定模块,用于根据所述单目相机的内参对所述二维位置信息和所述深度信息进行坐标变换,得到所述目标对象在所述单目相机坐标系下的三维位置信息。A three-dimensional information determination module is configured to perform coordinate transformation on the two-dimensional position information and the depth information according to the internal parameters of the monocular camera to obtain the three-dimensional position information of the target object in the coordinate system of the monocular camera.
  15. 一种芯片系统,其特征在于,包括:存储器,用于存储计算机程序;处理器;当处理器从存储器中调用并运行计算机程序后,使得安装有该芯片系统的电子设备执行如权利要求1-7中任一项所述的方法。A chip system, characterized in that it includes: a memory for storing a computer program; a processor; when the processor calls and runs the computer program from the memory, it causes the electronic device installed with the chip system to execute the steps as claimed in claim 1- The method described in any one of 7.
  16. 一种包含指令的计算机程序产品,其特征在于,当其在计算机上运行时,使得所述计算机执行如权利要求1-7中任一项所述的方法。A computer program product containing instructions, characterized in that, when run on a computer, it causes the computer to perform the method according to any one of claims 1-7.
  17. 一种计算机可读存储介质,其特征在于,包括指令,当所述指令在计算机上运行时,使得计算机执行如权利要求1-7中任一项所述的方法。A computer-readable storage medium, characterized by comprising instructions that, when run on a computer, cause the computer to perform the method according to any one of claims 1-7.
  18. 一种电子设备,其特征在于,包括:An electronic device, characterized by including:
    存储器,用于存储可读程序;Memory, used to store readable programs;
    至少一个处理器,用于从所述存储器中调用并运行所述可读程序,使得所述通信装置实现如权利要求1~7中任一项所述的方法。At least one processor, configured to call and run the readable program from the memory, so that the communication device implements the method according to any one of claims 1 to 7.
PCT/CN2022/088566 2022-04-22 2022-04-22 Object detection model training method, and object detection method and apparatus WO2023201723A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2022/088566 WO2023201723A1 (en) 2022-04-22 2022-04-22 Object detection model training method, and object detection method and apparatus
CN202280005788.8A CN117280385A (en) 2022-04-22 2022-04-22 Training method of target detection model, target detection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/088566 WO2023201723A1 (en) 2022-04-22 2022-04-22 Object detection model training method, and object detection method and apparatus

Publications (1)

Publication Number Publication Date
WO2023201723A1 true WO2023201723A1 (en) 2023-10-26

Family

ID=88418973

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/088566 WO2023201723A1 (en) 2022-04-22 2022-04-22 Object detection model training method, and object detection method and apparatus

Country Status (2)

Country Link
CN (1) CN117280385A (en)
WO (1) WO2023201723A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2014178969A (en) * 2013-03-15 2014-09-25 Nec Solution Innovators Ltd Information processor and determination method
CN112668460A (en) * 2020-12-25 2021-04-16 北京百度网讯科技有限公司 Target detection method, electronic equipment, road side equipment and cloud control platform
CN113128434A (en) * 2021-04-27 2021-07-16 南京大学 Method for carrying out 3D target detection on monocular RGB image
CN113947768A (en) * 2021-10-15 2022-01-18 京东鲲鹏(江苏)科技有限公司 Monocular 3D target detection-based data enhancement method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2014178969A (en) * 2013-03-15 2014-09-25 Nec Solution Innovators Ltd Information processor and determination method
CN112668460A (en) * 2020-12-25 2021-04-16 北京百度网讯科技有限公司 Target detection method, electronic equipment, road side equipment and cloud control platform
CN113128434A (en) * 2021-04-27 2021-07-16 南京大学 Method for carrying out 3D target detection on monocular RGB image
CN113947768A (en) * 2021-10-15 2022-01-18 京东鲲鹏(江苏)科技有限公司 Monocular 3D target detection-based data enhancement method and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
HUANG GUO-SHING, YU-YONG TSENG: "Application of Stereo Vision 3D Target Recognition Using Camera Calibration Algorithm", PROCEEDINGS OF THE 2015 AASRI INTERNATIONAL CONFERENCE ON CIRCUITS AND SYSTEMS, 31 August 2015 (2015-08-31), pages 381 - 385, XP093102170 *
R. TSAI: "A versatile camera calibration technique for high-accuracy 3D machine vision metrology using off-the-shelf TV cameras and lenses", IEEE JOURNAL ON ROBOTICS AND AUTOMATION, IEEE, USA, vol. 3, no. 4, 1 August 1987 (1987-08-01), USA , pages 323 - 344, XP011217413, ISSN: 0882-4967, DOI: 10.1109/JRA.1987.1087109 *

Also Published As

Publication number Publication date
CN117280385A (en) 2023-12-22

Similar Documents

Publication Publication Date Title
KR102126724B1 (en) Method and apparatus for restoring point cloud data
JP6830139B2 (en) 3D data generation method, 3D data generation device, computer equipment and computer readable storage medium
CN107230225B (en) Method and apparatus for three-dimensional reconstruction
WO2020206708A1 (en) Obstacle recognition method and apparatus, computer device, and storage medium
CN110348454B (en) Matching local image feature descriptors
JP6902122B2 (en) Double viewing angle Image calibration and image processing methods, equipment, storage media and electronics
US20120177284A1 (en) Forming 3d models using multiple images
US20120177283A1 (en) Forming 3d models using two images
CN104616278B (en) Three-dimensional point cloud interest point detection method and system
EP3182369B1 (en) Stereo matching method, controller and system
US20190080464A1 (en) Stereo matching method and apparatus
JP7173285B2 (en) Camera calibration device, camera calibration method, and program
US20210144357A1 (en) Method and apparatus with depth image generation
WO2020215257A1 (en) Image stereo matching method and assisted driving apparatus
WO2022110862A1 (en) Method and apparatus for constructing road direction arrow, electronic device, and storage medium
CN113447923A (en) Target detection method, device, system, electronic equipment and storage medium
WO2022205663A1 (en) Neural network training method and apparatus, target object detecting method and apparatus, and driving control method and apparatus
JP2020042503A (en) Three-dimensional symbol generation system
GB2567245A (en) Methods and apparatuses for depth rectification processing
Ahmadabadian et al. Stereo‐imaging network design for precise and dense 3D reconstruction
TWI571099B (en) Device and method for depth estimation
WO2023201723A1 (en) Object detection model training method, and object detection method and apparatus
CN117726747A (en) Three-dimensional reconstruction method, device, storage medium and equipment for complementing weak texture scene
EP4187483A1 (en) Apparatus and method with image processing
Lin et al. Real-time low-cost omni-directional stereo vision via bi-polar spherical cameras

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 17759278

Country of ref document: US

WWE Wipo information: entry into national phase

Ref document number: 202280005788.8

Country of ref document: CN

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22937959

Country of ref document: EP

Kind code of ref document: A1