CN116434016B

CN116434016B - Image information enhancement method, model training method, device, equipment and medium

Info

Publication number: CN116434016B
Application number: CN202310692621.XA
Authority: CN
Inventors: 赵云; 龚湛; 李军; 朱红
Original assignee: Suzhou Inspur Intelligent Technology Co Ltd
Current assignee: Suzhou Inspur Intelligent Technology Co Ltd
Priority date: 2023-06-13
Filing date: 2023-06-13
Publication date: 2023-08-22
Anticipated expiration: 2043-06-13
Also published as: CN116434016A

Abstract

The embodiment of the invention provides an image information enhancement method. The method comprises the following steps: acquiring image information sequences and point cloud information acquired by a plurality of cameras; selecting an alternative target sample from a target sample set, adding the alternative target sample into the current moment image information to obtain enhanced current moment image information, adding point cloud information corresponding to the alternative target sample into other moment image information to obtain enhanced other moment image information, and adding the point cloud information corresponding to the alternative target sample to enable the positions of the alternative target sample in the other moment image information and the current moment image information to meet the time sequence training requirement, so that the diversity of the target sample in the image information is greatly increased through the extraction, screening and mapping processing of the target sample, the excessive number of model parameters is avoided, the number distribution of the target sample is more balanced, the generalization performance and the detection precision of a model are further improved, and the detection performance of less target sample types can also be improved.

Description

Image information enhancement method, model training method, device, equipment and medium

Technical Field

The present invention relates to the field of computer technologies, and in particular, to an image information enhancement method, an image information enhancement-based object detection model training method, an image information enhancement device, an image information enhancement-based object detection model training device, an electronic device, and a readable storage medium.

Background

The rapid development of artificial intelligence technology brings a leap to automatic driving technology. The automatic driving system comprises three parts of sensing, planning and controlling, wherein the sensing part plays a role of an 'eye' of the automatic driving automobile and plays a key role. The quality of an autopilot system often depends largely on the quality of its perception system. The automatic driving sensing system is used for sensing the surrounding environment of an automatic driving vehicle through various sensors such as a camera and a radar, and not only accurately identifying vehicles, pedestrians, obstacles, traffic signs and the like in the surrounding environment, but also accurately identifying, positioning and predicting the speed of the vehicle. Considering cost factors of mass production vehicles, the camera is low in price, the technology is mature, and the image can acquire rich target apparent information, so that the camera is widely applied to automatic driving vehicles, and the three-dimensional target detection technology based on the camera image data is always the focus of research.

The sensing system outputs information such as the position, the length, the width, the height, the speed and the like of the detection target in the three-dimensional space by reading the image information of the camera, so that the vehicle can be driven to further carry out decision planning. At present, three-dimensional object detection technology based on image data is characterized in that firstly, objects such as vehicles, pedestrians and bicycles are detected in an image through the existing image object detection technology to obtain information such as the position, the size and the like of the objects in an image coordinate system, and then the detected objects are mapped into the world coordinate system through the relation between a camera and the world coordinate system to obtain information such as the position, the length, the width, the height, the speed and the like of each object in a three-dimensional space. The method benefits from the existing mature two-dimensional target detection system, and can output more accurate image target detection results. However, the two-step detection frame cannot perform end-to-end training, is difficult to integrate joint debugging with a subsequent module of an automatic driving system, and meanwhile, has weak adjustment capability in a target result mapping process, so that the three-dimensional accuracy effect is poor.

In the training process of the deep neural network, the model parameter quantity is too large, so that the phenomenon of over fitting can often occur, and the generalization performance of the model is affected. Meanwhile, as the number of the target type samples in the data set is unevenly distributed, the detection performance of the detector on fewer sample types can be affected.

Disclosure of Invention

The technical problem to be solved by the embodiment of the invention is to provide a training method, a training device, electronic equipment and a readable storage medium for a target detection model, so as to solve the problem that the generalization performance of the model is affected due to the fact that the model parameter quantity is too large and the fitting phenomenon is often caused, and meanwhile, the detection performance of a detector on fewer sample types is affected due to the fact that the number distribution of target sample types in a data set is not uniform.

In order to solve the above problems, the present invention provides an image information enhancement method, the method comprising:

acquiring image information sequences and corresponding point cloud information acquired by a plurality of cameras; the image information sequence comprises current moment image information and other moment image information;

selecting an alternative target sample from the target sample set;

adding the alternative target sample into the current time image information according to the current time image information to obtain enhanced current time image information, and adding point cloud information corresponding to the alternative target sample into point cloud information corresponding to the current time;

and adding the candidate target sample into the image information of other time according to the coordinate system related information of the current time and other time to obtain the image information of other time, and adding the point cloud information corresponding to the candidate target sample into the point cloud information corresponding to the other time to enable the position of the candidate target sample in the image information of other time to meet the time sequence training requirement relative to the position of the candidate target sample in the image information of the current time.

Optionally, the adding the candidate target sample to the current time image information according to the current time image information, and obtaining the enhanced current time image information includes:

and adding the target samples in the candidate target samples and the current time image information into the current time image information according to the sequence from far to near of depth according to the current time image information in the image information sequence, the three-dimensional label of the candidate target samples and the coordinate system related information, and obtaining the enhanced current time image information.

Optionally, the adding the candidate target sample to the image information of the other time according to the coordinate system related information of the current time and the other time, and obtaining the image information of the enhanced other time includes:

according to the coordinate system related information of the current moment and other moments and the three-dimensional label of the candidate target sample, adding the candidate target sample and the target sample in the image information of other moments into the image information of other moments according to the sequence from far to near of depth to obtain the image information of enhanced other moments; wherein the three-dimensional tag includes speed information.

Optionally, before the adding the candidate target sample to the current time image information according to the current time image information to obtain enhanced current time image information, and adding the point cloud information corresponding to the candidate target sample to the point cloud information corresponding to the current time, the method further includes:

determining the current time image information to which the candidate target sample is added according to the current time image information, the three-dimensional label of the candidate target sample and the coordinate system related information, and a camera for collecting the current time image information;

determining the image information of other moments to which the candidate target sample is added according to the coordinate system related information of the current moment and other moments and the three-dimensional label of the candidate target sample, and acquiring cameras of the image information of other moments;

and if the camera for acquiring the image information of the other moment and the camera for acquiring the image information of the current moment are the same camera, reserving the alternative target sample.

The invention also provides a target detection model training method based on image information enhancement, which comprises the following steps:

selecting an alternative target sample from the target sample set;

according to the coordinate system related information of the current time and other time, adding the alternative target sample into the other time image information to obtain enhanced other time image information, and adding the point cloud information corresponding to the alternative target sample into the point cloud information corresponding to the other time, so that the position of the alternative target sample in the other time image information accords with a time sequence training requirement relative to the position in the current time image information;

training the target detection model by adopting the enhanced current moment image information, the enhanced other moment image information, the target sample marking data and the point cloud information; the target detection result output by the target detection model comprises the position information, the length, the width, the height and the speed of the target.

Optionally, the target detection model includes an image feature extraction module, a feature map depth estimation module, an image feature mapping module, and a target detection module, and training the target detection model by using the enhanced current time image information and enhanced other time image information, the target sample mark data, and the point cloud information includes:

inputting the image information of the enhanced current moment and the images of the enhanced other moments into the image feature extraction module for feature extraction to obtain corresponding image features, and outputting the image features to the feature image depth estimation module;

and inputting the image features and the depth map output by the feature map depth estimation module into the image feature mapping module to obtain corresponding aerial view features, and outputting the aerial view features to the target detection module.

Optionally, the feature map depth estimation module includes a first layer neural network and a second layer neural network, and before the inputting the image features and the depth maps output by the feature map depth estimation module into the image feature mapping module, obtaining corresponding aerial view features, and outputting the aerial view features to the target detection module, the method further includes:

Outputting the image characteristics to the first layer neural network to obtain a first depth map output by the first layer neural network;

and after cascading the characteristic channels, the characteristic data output by the first depth map and the first layer neural network are input into the second layer neural network to obtain a second depth map output by the second layer neural network, and the second depth map is used as the depth map output by the characteristic map depth estimation module.

Optionally, the target detection model further comprises a time sequence information fusion module, and the target detection result comprises the speed of the target; the inputting the image feature and the depth map output by the feature map depth estimation module into the image feature mapping module to obtain a corresponding aerial view feature, and outputting the aerial view feature to the target detection module includes:

inputting the image characteristics, the depth images output by the characteristic image depth estimation module and the coordinate system related information at different moments into the image characteristic mapping module respectively aiming at the image information at each moment to obtain corresponding aerial view characteristics;

inputting the bird's-eye view features corresponding to each moment into the time sequence information fusion module for fusion to obtain fused bird's-eye view features, and outputting the fused bird's-eye view features to the target detection module.

Optionally, the object detection model further includes a feature decoding module, and the outputting the aerial view feature to the object detection module includes:

inputting the aerial view features into the feature decoding module for decoding to obtain decoded aerial view features;

and outputting the decoded aerial view characteristic to the target detection module.

The invention also provides an image information enhancement device, which comprises:

the information acquisition module is used for acquiring image information sequences acquired by a plurality of cameras and corresponding point cloud information; the image information sequence comprises current moment image information and other moment image information;

the sample selection module is used for selecting alternative target samples from the target sample set;

the first information enhancement module is used for adding the alternative target sample into the current time image information according to the current time image information to obtain enhanced current time image information, and adding point cloud information corresponding to the alternative target sample into point cloud information corresponding to the current time;

the second information enhancement module is used for adding the candidate target sample to the other time image information according to the coordinate system related information of the current time and other times to obtain enhanced other time image information, and adding the point cloud information corresponding to the candidate target sample to the point cloud information corresponding to the other times to enable the position of the candidate target sample in the other time image information to accord with the time sequence training requirement relative to the position in the current time image information.

Optionally, the first information enhancement module includes:

and the first sample adding sub-module is used for adding the candidate target sample and the target sample in the current time image information into the current time image information according to the sequence from far to near of depth according to the current time image information in the image information sequence, the three-dimensional label of the candidate target sample and the coordinate system related information, and obtaining the enhanced current time image information.

Optionally, the second information enhancement module includes:

the second sample adding sub-module is used for adding the target samples in the candidate target samples and the image information of other moments into the image information of other moments according to the coordinate system related information of the current moment and the other moments and the three-dimensional labels of the candidate target samples and the image information of other moments from far to near according to the depth order, so as to obtain the image information of the enhanced other moments; wherein the three-dimensional tag includes speed information.

Optionally, the apparatus further comprises:

a first camera determining module, configured to determine, according to the current time image information, a three-dimensional tag of the candidate target sample, and coordinate system related information, the current time image information to which the candidate target sample should be added, and a camera that collects the current time image information, before the candidate target sample is added to the current time image information according to the current time image information, to obtain enhanced current time image information, and before point cloud information corresponding to the candidate target sample is added to point cloud information corresponding to the current time;

The second camera determining module is used for determining the other time image information to which the alternative target sample is added according to the coordinate system related information of the current time and other time and the three-dimensional label of the alternative target sample, and a camera for collecting the other time image information;

and the sample reservation module is used for reserving the alternative target sample if the camera for acquiring the image information of other moments and the camera for acquiring the image information of the current moment are the same camera.

The invention also provides a target detection model training device based on image information enhancement, which comprises:

Optionally, the first information enhancement module includes:

Optionally, the second information enhancement module includes:

Optionally, the apparatus further comprises:

The embodiment of the invention also discloses an electronic device which is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus;

A memory for storing a computer program;

and a processor for performing the method steps described above when executing the program stored on the memory.

The embodiment of the invention also discloses a readable storage medium, which enables the electronic device to execute one or more of the methods in the embodiment of the invention when the instructions in the storage medium are executed by the processor of the electronic device.

According to the embodiment of the invention, the image information sequences and the corresponding point cloud information acquired by a plurality of cameras are acquired; the image information sequence comprises current time image information and other time image information, an alternative target sample is selected from a target sample set, the alternative target sample is added to the current time image information according to the current time image information to obtain enhanced current time image information, point cloud information corresponding to the alternative target sample is added to the point cloud information corresponding to the current time, the alternative target sample is added to the other time image information according to coordinate system related information of the current time and other time to obtain enhanced other time image information, and the point cloud information corresponding to the alternative target sample is added to the point cloud information corresponding to the other time, so that the position of the alternative target sample in the other time image information accords with a time sequence training requirement relative to the position in the current time image information, the diversity of the target sample in the image information is greatly increased through target sample extraction and mapping, the diversity of the target sample in the image information is avoided, the number distribution of the target sample is increased, the detection accuracy of the target sample is improved, and the detection accuracy of the target sample can be improved.

Drawings

FIG. 1 is a flow chart showing the steps of a method for enhancing image information according to one embodiment of the present invention;

FIG. 2 shows a schematic flow diagram of a target detection model;

FIG. 3 is a flowchart showing the steps of a method for training an object detection model based on image information enhancement according to an embodiment of the present invention;

FIG. 4 shows a schematic flow diagram of a depth estimation cascade;

fig. 5 is a block diagram showing an embodiment of an image information enhancement apparatus according to an embodiment of the present invention;

FIG. 6 is a block diagram illustrating an embodiment of a training method for an object detection model based on image information enhancement according to another embodiment of the present invention;

fig. 7 shows a block diagram of an electronic device, according to an example embodiment.

Detailed Description

In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description.

Referring to fig. 1, a step flowchart of an image information enhancement method provided by an embodiment of the present invention may specifically include the following steps:

step 101, acquiring image information sequences and corresponding point cloud information acquired by a plurality of cameras; wherein the image information sequence comprises current time image information and other time image information.

In the embodiment of the invention, a plurality of cameras respectively acquire images to obtain image information. For example, image information acquired by a plurality of cameras on a vehicle. The point cloud information refers to an information set of points in a certain coordinate system, and the points contain rich information, including three-dimensional coordinates X, Y, Z, colors, classification values, intensity values, time and the like. For example, point cloud information obtained by a lidar, to which embodiments of the present invention are not limited.

In an embodiment of the invention, the image information sequence comprises current time image information and other time image information. There may be a plurality of pieces of current time image information. There may be a plurality of other time image information. For example, as shown in fig. 2, the current time image informationOther time image information->。

Step 102, selecting an alternative target sample from the target sample set.

In embodiments of the present invention, the target may be a vehicle, person, animal, railing, pole, etc. on the image, or any other suitable target, and embodiments of the present invention are not limited in this respect. The target used to train the model is noted as a target sample. The target sample marking data includes the depth of the target, which refers to the distance to the target. The target sample marking data may also include classification, length, width, height, speed, three-dimensional position, etc. of the target, or any other suitable data, which is not limited in this embodiment of the present invention.

In the embodiment of the invention, the target sample marking data can be used as the true value of the target detection result in training, and the point cloud information can be used for determining the true value of the depth map in training.

In an embodiment of the invention, the target sample set is a set of targets and related data. The target sample set may be generated by combining the point cloud information and the image information of the target in advance.

For example, the training data set includes, but is not limited to, image information at different moments in time, point cloud information, coordinate system related information, three-dimensional labels of the targets. Three-dimensional labels based on each target sample s from the entire training datasetExtracting corresponding point cloud information->And image block->. Then, the data (three-dimensional labels, point cloud information, image blocks) of all the extracted target samples and the corresponding coordinate system related information are collated into a database, namely a target sample set.

The three-dimensional label refers to a three-dimensional information label of a target, and specifically includes, but is not limited to, a three-dimensional position, a length, a width, a height and a speed of the target. The coordinate system related information includes, but is not limited to, camera parameters, radar (point cloud information) parameters, vehicle pose, and the like corresponding to the moment when the target sample is located.

In the embodiment of the invention, in the model training process, an alternative target sample is randomly extracted from a target sample set, and sample screening and mapping are performed according to three-dimensional space coordinate information of the alternative target sample, so that the enhancement of image data is completed.

For example, during training, an alternative target sample is selected from the target sample set, resulting in an alternative target sample set. The alternative target sample set may be obtained in a random sampling manner, or any other applicable manner, which is not limited by the embodiment of the present invention.

Step 103, adding the candidate target sample to the current time image information according to the current time image information to obtain enhanced current time image information, and adding point cloud information corresponding to the candidate target sample to the point cloud information corresponding to the current time.

In the embodiment of the invention, in order to solve the problems of overlarge model parameters and unbalanced number distribution of target samples in a data set, the generalization performance and detection precision of the model are improved, and the image data enhancement technology based on the target sample set is provided except for adopting conventional image data enhancement methods such as scaling, cutting, rotation and overturning.

In the embodiment of the invention, a three-dimensional sample data enhancement algorithm is developed, and sample extraction, screening and mapping are performed in a data preprocessing stage, so that the diversity of targets is greatly increased, and the target detection precision is improved.

In the embodiment of the invention, the candidate target samples are added into the current time image information according to the current time image information, so that the number of the target samples in the current time image information is increased, and the obtained new current time image information is recorded as enhanced current time image information. Any suitable adding manner may be specifically adopted, and the embodiment of the present invention is not limited thereto.

In the embodiment of the invention, as the candidate target sample is added in the image information at the current moment, the point cloud information corresponding to the candidate target sample is also added in the point cloud information corresponding to the current moment, so that the point cloud information can correspond to the image information at the enhanced current moment.

In an optional embodiment of the present invention, according to the current time image information, adding the candidate target sample to the current time image information to obtain a specific implementation manner of enhancing the current time image information may include: and adding the target samples in the candidate target samples and the current time image information into the current time image information according to the sequence from far to near of depth according to the current time image information in the image information sequence, the three-dimensional label of the candidate target samples and the coordinate system related information, and obtaining the enhanced current time image information.

The coordinate system related information refers to information describing a mutual conversion relationship of a pose of the vehicle, a vehicle coordinate system, a camera coordinate system, a radar (i.e., point cloud information) coordinate system, and the like.

The original target sample in the current time image information also needs to be added to the current time image information again. And adding the candidate target sample and the target sample in the image information at the current moment into the image information at the current moment according to the sequence from far to near in depth, so that the near target shields the far target.

For example, first, a camera index, an image truth box, and a three-dimensional truth box of the candidate target sample attached to the image information at the current time are obtained according to the camera internal parameters at the current time, the external parameter relation between the camera and the radar, and the radar truth box of the candidate target sample.

The radar truth box is position data of the candidate target sample in a radar (point cloud information) coordinate system according to the three-dimensional label of the candidate target sample, the position relation of the radar and the vehicle. The camera index is that according to the three-dimensional label of the candidate target sample, the candidate target sample determines the current moment image information of one camera from the current moment image information of a plurality of cameras so that the positions of the candidate target sample relative to the radar coordinate system are different. A camera index is used to represent the camera. The image truth box is data of category, length, width, height, position and the like of the candidate target sample in the image information at the current moment. The three-dimensional truth box is three-dimensional label data of the candidate target sample in the image information at the current moment.

And then, acquiring image truth boxes, three-dimensional truth boxes, image blocks and point cloud information corresponding to all target samples in the image information at the current moment.

And then mapping all the target samples in the candidate target sample and the current time image information according to the sequence from far to near of the depth value corresponding to the center point of the three-dimensional truth box under each camera, so as to add the target samples in the candidate target sample and the current time image information into the current time image information. And pasting the corresponding point cloud information. And finally obtaining the enhanced current moment image information of the current moment and the corresponding point cloud information.

Step 104, adding the candidate target sample to the image information of other time according to the coordinate system related information of the current time and other time to obtain the image information of other time to be enhanced, and adding the point cloud information corresponding to the candidate target sample to the point cloud information corresponding to the other time to enable the position of the candidate target sample in the image information of other time to be in accordance with the time sequence training requirement relative to the position in the image information of the current time.

In the embodiment of the invention, the coordinate system related information of different moments comprises information of interconversion relations of the pose of the vehicle, the vehicle coordinate system, the camera coordinate system and the radar (namely point cloud information) coordinate system of different moments.

In the embodiment of the invention, the alternative target sample is added to the image information at the current moment, so that the image information at other moments should also be added with the alternative target sample, thereby having consistency of the image information at different moments. Therefore, the candidate target sample needs to be added into the image information of other time according to the coordinate system related information of the current time and the coordinate system related information of other time, and the obtained new image information of other time is marked as enhanced image information of other time, so that the position of the candidate target sample in the image information of other time accords with the time sequence training requirement relative to the position in the image information of the current time, that is, the position change of the candidate target sample in the image information of the current time and the image information of other time should have logical rationality. For example, the candidate target sample is image information at the current timePosition A of (2) relative to the image information at other moments +.>The distance between position a and position B may be constant, or the distance may be increased by m, or the distance may be decreased by n, m and n should be within a reasonable range.

In the embodiment of the present invention, a specific implementation manner of adding the candidate target sample to the image information of the other time according to the coordinate system related information of the current time and the other time may include various embodiments, which is not limited in the embodiment of the present invention.

In the embodiment of the invention, as the candidate target sample is added in the image information at other moments, the point cloud information corresponding to the candidate target sample is also added in the point cloud information corresponding to other moments, so that the point cloud information can correspond to the image information at other moments.

In an optional embodiment of the present invention, according to coordinate system related information of the current time and other times, adding the candidate target sample to the other time image information to obtain a specific implementation manner of enhancing the other time image information may include: according to the coordinate system related information of the current moment and other moments and the three-dimensional label of the candidate target sample, adding the candidate target sample and the target sample in the image information of other moments into the image information of other moments according to the sequence from far to near of depth to obtain the image information of enhanced other moments; wherein the three-dimensional tag includes speed information.

The three-dimensional tag of the candidate target sample includes speed information, according to which it is determined that a distance between a position of the candidate target sample in the current time image information and a position in the other time image information should coincide with the speed information, that is, after the position of the candidate target sample in the current time image information is determined, the position of the candidate target sample in the other time image information is determined according to the speed information and a time different from the time.

For example, first, a camera index, an image truth box, and a three-dimensional truth box, which are obtained by pasting the candidate target sample to image information at other times, are obtained based on coordinate system related information of the current time and other times and the three-dimensional label of the candidate target sample. And then acquiring image truth boxes, three-dimensional truth boxes, image blocks and point cloud information corresponding to all target samples in the image information at other moments. And finally mapping all target samples in the candidate target samples and the image information of other moments according to the sequence from far to near of the depth value corresponding to the center point of the three-dimensional truth box under each camera, so that the target samples in the candidate target samples and the image information of other moments are added into the image information of other moments, and the positions of the candidate target samples in the image information of other moments are consistent with the time sequence training requirements relative to the positions of the candidate target samples in the image information of the current moment. And pasting the corresponding point cloud information. And finally obtaining the image information of the enhanced other time at the current time and the corresponding point cloud information.

The time sequence training requirement is a requirement for ensuring consistency and accuracy of image information at different moments. For example, the position of the target sample in the image information at different moments in time should coincide with the velocity of the target sample.

In an optional embodiment of the present invention, before the adding the candidate target sample to the current time image information according to the current time image information to obtain enhanced current time image information, and adding point cloud information corresponding to the candidate target sample to point cloud information corresponding to the current time, the method may further include: determining the current time image information to which the candidate target sample is added according to the current time image information, the three-dimensional label of the candidate target sample and the coordinate system related information, and a camera for collecting the current time image information; determining the image information of other moments to which the candidate target sample is added according to the coordinate system related information of the current moment and other moments and the three-dimensional label of the candidate target sample, and acquiring cameras of the image information of other moments; and if the camera for acquiring the image information of the other moment and the camera for acquiring the image information of the current moment are the same camera, reserving the alternative target sample.

In order to ensure consistency and accuracy of image information at different moments, candidate target samples need to be screened. Acquiring target samples at other moments according to RT (rotation translation) relation of radar coordinate systems of the current moment and other moments and the speed of the candidate target sample j Corresponding three-dimensional true value->(three-dimensional tag data in a radar coordinate system). Make +.>And projecting the sample to a corresponding camera, and if the camera index corresponding to the candidate target sample j at the current moment is consistent, reserving the sample.

Referring to fig. 3, a flowchart illustrating steps of a training method for an object detection model based on image information enhancement according to an embodiment of the present invention may specifically include the following steps:

step 201, acquiring image information sequences and corresponding point cloud information acquired by a plurality of cameras; wherein the image information sequence comprises current time image information and other time image information.

In the embodiments of the present invention, the specific implementation manner of this step may be referred to the description in the foregoing embodiments, which is not repeated herein.

Step 202, selecting an alternative target sample from the target sample set.

Step 203, adding the candidate target sample to the current time image information according to the current time image information, obtaining enhanced current time image information, and adding point cloud information corresponding to the candidate target sample to the point cloud information corresponding to the current time.

And 204, adding the candidate target sample to the image information of other time according to the coordinate system related information of the current time and other time to obtain the image information of the enhanced other time, and adding the point cloud information corresponding to the candidate target sample to the point cloud information corresponding to the other time to enable the position of the candidate target sample in the image information of the other time to meet the time sequence training requirement relative to the position in the image information of the current time.

Step 205, training the target detection model by using the enhanced current time image information, the enhanced other time image information, the target sample mark data and the point cloud information; the target detection result output by the target detection model comprises the position information, the length, the width, the height and the speed of the target.

In the embodiment of the present invention, the object detection model may detect position information, length, width, height, classification, speed, etc. of the object on the image, or any other suitable detection, which is not limited in the embodiment of the present invention. Wherein the position information may be a three-dimensional position, that is to say, comprising a depth on the image in addition to a position on the image plane.

In the embodiment of the invention, in order to enable the target detection model to detect the image information at different moments or targets in the target image, the target detection result is output, and the target detection model needs to be trained by respectively adopting target sample marking data aiming at the image information marks.

For example, the final loss function used in the training process isWherein the method comprises the steps ofThe depth estimation loss, the target classification loss, the target position, the length, the width and the height of the image features and the speed estimation loss are respectively obtained. />The weight corresponding to each loss.

In the embodiment of the invention, the time sequence data is used for training the target detection model. Specifically, the target detection model is trained by enhancing the current moment image information, enhancing other moment image information, and target sample mark data and point cloud information.

In an alternative embodiment of the present invention, the object detection model includes an image feature extraction module, a feature map depth estimation module, an image feature mapping module, and an object detection module.

The training of the target detection model by using the enhanced current time image information, the enhanced other time image information, the target sample marking data and the point cloud information may include: inputting the image information of the enhanced current moment and the images of the enhanced other moments into the image feature extraction module for feature extraction to obtain corresponding image features, and outputting the image features to the feature image depth estimation module; and inputting the image features and the depth map output by the feature map depth estimation module into the image feature mapping module to obtain corresponding aerial view features, and outputting the aerial view features to the target detection module.

In the embodiment of the invention, the image feature extraction module is a module for feature extraction, and specifically, the image feature extraction can be performed through a deep neural network. The specific implementation manner of the image feature extraction may be any suitable implementation manner, which is not limited by the embodiment of the present invention.

In the embodiment of the invention, the image information of the enhanced current moment is input into the image feature extraction module for feature extraction, so that the image features corresponding to the image information of the enhanced current moment are obtained. And inputting the images at the other enhanced moments into an image feature extraction module for feature extraction to obtain image features corresponding to the image information at the other enhanced moments.

For example, a flow diagram of the object detection model is shown in fig. 2. The image feature extraction module acquires image sequences of a plurality of cameras，/>Image information (6) at time t, for example>Image feature extraction is carried out on the image information (6) at the time t-1 respectively through an image feature extraction module,obtain corresponding image features->Wherein->Representing the number of cameras->Scale representing image features ∈>The number of channels representing the image features.

In the embodiment of the invention, the feature map depth estimation module is a module for obtaining a depth map corresponding to an image feature map, and specifically, the depth map can be extracted through a multi-layer neural network, and a true value of the depth map output by the feature map depth estimation module is determined by point cloud information. Any suitable implementation may be specifically adopted, and embodiments of the present invention are not limited thereto.

In the embodiment of the invention, the image features corresponding to the image information of the enhanced current moment are output to the feature map depth estimation module. In addition, the image features corresponding to the image information of the enhanced other moments are output to a feature map depth estimation module.

For example, as shown in FIG. 2, a truth depth map in image space is generated based on the point cloud information, and then the feature map depth estimation module is trained using the generated truth depth map. By image featuresObtaining a depth map corresponding to the image feature map>Wherein D is the number of quantized depth (i.e. the number of depth value ranges in the depth map), i.e. the depth +.>Divided into D units, the value on the ith unit representing the depth of the current feature point at the ith unitProbability in the meta-depth range.

In an alternative embodiment of the invention, the object detection model further comprises an image feature enhancement module. In one specific implementation manner of outputting the image features to the feature map depth estimation module, the method may include: inputting the image features into the image feature enhancement module for feature enhancement to obtain enhanced image features; and outputting the enhanced image features to the feature map depth estimation module.

The image feature enhancement module is a module for enhancing image features, for example, by upsampling, features of a large image are effectively learned, and enhanced image features are obtained, and specifically any suitable implementation manner may be adopted, which is not limited by the embodiment of the present invention.

For example, the image feature enhancement module is a multi-layer neural network, so that the image feature expression capability is further improved, and the enhanced image features are obtained。

In an embodiment of the invention, bird's Eye View (BEV) features are feature data under the BEV view. BEV views are projections of the pointing cloud on a plane perpendicular to the height direction. Typically, prior to obtaining the BEV view, space is segmented into voxels, the point cloud is downsampled with the voxels, and then each voxel is projected as a point. The voxel is a rectangular solid when a three-dimensional space is divided into rectangular solid of a fixed size. The coordinates of the pixel points of the BEV view can be obtained during voxel projection. The feature value of each pixel point can be obtained in various ways.

In the embodiment of the present invention, the image feature mapping module is configured to generate a model of the aerial view feature by mapping according to the image feature and the depth map through a coordinate system, and any suitable implementation manner may be specifically adopted, which is not limited in the embodiment of the present invention.

In the embodiment of the invention, the target detection module is used for detecting the image information at different moments or targets in the target image according to the aerial view feature, and outputting a model of a target detection result, and any applicable implementation mode can be adopted, and the embodiment of the invention is not limited to the above.

In the embodiment of the invention, the image features corresponding to the image information of the enhanced current moment, the image features corresponding to the image information of the enhanced other moments and the depth map output by the feature map depth estimation module are input into the image feature mapping module to obtain the corresponding aerial view features, and the aerial view features are output to the target detection module.

For example, as shown in FIG. 2, based on enhanced image featuresAnd estimated depth map->Generating three-dimensional features in an image coordinate system>. Specifically, the->Each point feature in (a)I.e. two vectors are outer-product. According to the rotation translation relation among the camera internal parameters, the camera and the automatic driving vehicle, the three-dimensional feature in the image coordinate system is +.>The characteristics of each point in (a) are converted into an autonomous vehicle coordinate system. And converting the feature points into the coordinate system of the automatic driving vehicle to form three-dimensional feature representation in the coordinate system of the automatic driving vehicle through voxelization, wherein the feature points falling into the same voxel grid are accumulated, and the feature of the voxel grid without the feature points falling into is set to be all 0. Finally, accumulating the features corresponding to the voxel grids at all heights along the height dimension to obtain the final aerial view feature +. >Wherein X, Y represents both the front and back and left and right dimensions of the bird's eye view feature.

In an alternative embodiment of the invention, the feature map depth estimation module includes a first layer neural network and a second layer neural network. Before the inputting the image feature and the depth map output by the feature map depth estimation module into the image feature mapping module to obtain a corresponding aerial view feature, and outputting the aerial view feature to the target detection module, the method may further include: outputting the image characteristics to the first layer neural network to obtain a first depth map output by the first layer neural network; and after cascading the characteristic channels, the characteristic data output by the first depth map and the first layer neural network are input into the second layer neural network to obtain a second depth map output by the second layer neural network, and the second depth map is used as the depth map output by the characteristic map depth estimation module.

The first layer neural network takes image characteristics as input and outputs a first depth map. The truth value of the first depth map output by the first layer neural network is determined by the point cloud information. Any suitable implementation may be specifically adopted, and embodiments of the present invention are not limited thereto.

The second layer neural network takes the characteristic data obtained after the characteristic data in the first depth map and the first layer neural network are cascaded as input and outputs the neural network of the second depth map. The truth value of the second depth map output by the second layer neural network is determined by the point cloud information. Any suitable implementation may be specifically adopted, and embodiments of the present invention are not limited thereto.

In the cascading process, the method utilizes the point cloud information to supervise and train the feature map depth estimation module so as to improve the accuracy of mapping of the image features into the three-dimensional space.

For example, a flow diagram of depth map estimation as shown in fig. 4. In order to further improve the accuracy of depth estimation, a two-layer cascade depth estimation network is adopted, and a depth prediction image 1 (marked as a first depth image) estimated by a first layer of neural network is cascaded with features output by the first layer of neural network in a feature channel, and is input into a second layer of neural network to output a depth prediction image 2 (marked as a second depth image) as a depth image used by a subsequent module. In this way, the accuracy of the depth estimation is improved.

In an alternative embodiment of the invention, the object detection model further comprises a feature decoding module. In one specific implementation manner of outputting the aerial view feature to the target detection module, the method may include: inputting the aerial view features into the feature decoding module for decoding to obtain decoded aerial view features; and outputting the decoded aerial view characteristic to the target detection module.

The feature decoding module is a module for decoding the aerial view feature to obtain a decoded aerial view feature. For example, as shown in FIG. 2, the fused BEV features are decoded by a multi-layer neural convolution network to obtain a final bird's eye view feature output. Any suitable implementation may be specifically adopted, and embodiments of the present invention are not limited thereto.

In an optional embodiment of the invention, the target detection model further comprises a time sequence information fusion module, and the target detection result comprises a speed of the target. The inputting the image feature and the depth map output by the feature map depth estimation module into the image feature mapping module to obtain a corresponding aerial view feature, and outputting the aerial view feature to the target detection module includes: inputting the image characteristics, the depth images output by the characteristic image depth estimation module and the coordinate system related information at different moments into the image characteristic mapping module respectively aiming at the image information at each moment to obtain corresponding aerial view characteristics; inputting the bird's-eye view features corresponding to each moment into the time sequence information fusion module for fusion to obtain fused bird's-eye view features, and outputting the fused bird's-eye view features to the target detection module.

The single frame image information is limited, and the detection performance of the three-dimensional target is affected. The method introduces a four-dimensional space formed by time sequence fusion of the time information, not only greatly improves the prediction precision of the speed of the three-dimensional target, but also can realize better detection effect on temporarily shielded objects.

The time sequence information fusion module is used for fusing the aerial view features at different moments. For each time image information, a corresponding bird's eye view feature is generated. And fusing the aerial view features corresponding to the moments.

And respectively inputting the image features, the depth images output by the feature image depth estimation module and the coordinate system related information at different moments into the image feature mapping module to obtain corresponding aerial view features. For example, aiming at enhancing the image information at the current moment, the corresponding image features, the depth map output by the feature map depth estimation module and the coordinate system related information are input into the image feature mapping module to obtain the aerial view features corresponding to the image information at the current moment. And inputting the corresponding image features, the depth map output by the feature map depth estimation module and the coordinate system related information into the image feature mapping module aiming at enhancing the image information at other moments to obtain the aerial view features corresponding to the image information at other moments.

And then inputting the bird's-eye view features corresponding to each moment into a time sequence information fusion module for fusion to obtain fused bird's-eye view features.

For example, as shown in FIG. 2, BEV characteristics at times t and t-1And the input time sequence information fusion module is used for fusion. The fused aerial view features are noted as fused aerial view features. And outputting the fused aerial view characteristic to a target detection module.

For example as in figure 2And->Inputting the bird's-eye view features corresponding to each moment into a time sequence information fusion module for fusion to obtain fused bird's-eye view features,therefore, the influence of the motion of the acquired vehicle on the alignment of the aerial view features at different moments is further reduced.

In addition, the diversity of time sequence information is increased by randomly extracting frames in the training process. The image information at the previous moment is randomly selected from the image information at a plurality of adjacent moments to be matched with the image information at the current moment, so that the diversity of the time sequence information is greatly enhanced.

In the embodiment of the invention, based on the target detection model obtained through training, a target image sequence acquired by a plurality of cameras is taken as input, and target detection is carried out by the target detection model, so that a target detection result is obtained. The target detection model can detect the depth of a target, and can also detect the information of the classification, length, width, height, speed, three-dimensional position and the like of the target.

According to the embodiment of the invention, the image information sequences and the corresponding point cloud information acquired by a plurality of cameras are acquired; the image information sequence comprises current time image information and other time image information, an alternative target sample is selected from a target sample set, the alternative target sample is added to the current time image information according to the current time image information to obtain enhanced current time image information, point cloud information corresponding to the alternative target sample is added to the point cloud information corresponding to the current time, the alternative target sample is added to the other time image information according to coordinate system related information of the current time and other time to obtain enhanced other time image information, the point cloud information corresponding to the alternative target sample is added to the point cloud information corresponding to the other time, so that the position of the alternative target sample in the other time image information accords with a time sequence training requirement relative to the position in the current time image information, the enhanced other time image information, the target sample marking data and the point cloud information are adopted, and the target detection model is trained; the target detection result output by the target detection model comprises the position information, the length, the width and the speed of the target, so that the diversity of target samples in image information is greatly increased through the extraction, the screening and the mapping processing of the target samples, the overlarge parameter quantity of the target detection model is avoided, meanwhile, the number distribution of the target samples is more balanced, the generalization performance and the detection precision of the target detection model can be improved, and the detection performance of less target sample types can be improved.

It should be noted that, for simplicity of description, the method embodiments are shown as a series of acts, but it should be understood by those skilled in the art that the embodiments are not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred embodiments, and that the acts are not necessarily required by the embodiments of the invention.

Referring to fig. 5, a block diagram of an embodiment of an image information enhancement apparatus according to another embodiment of the present invention is shown, which may specifically include the following modules:

an information acquisition module 301, configured to acquire image information sequences acquired by a plurality of cameras and corresponding point cloud information; the image information sequence comprises current moment image information and other moment image information;

a sample selection module 302, configured to select an alternative target sample from the target sample set;

the first information enhancing module 303 is configured to add the candidate target sample to the current time image information according to the current time image information, obtain enhanced current time image information, and add point cloud information corresponding to the candidate target sample to point cloud information corresponding to the current time;

The second information enhancing module 304 is configured to add the candidate target sample to the other time image information according to the coordinate system related information of the current time and the other time, obtain enhanced other time image information, and add the point cloud information corresponding to the candidate target sample to the point cloud information corresponding to the other time, so that the position of the candidate target sample in the other time image information accords with a time sequence training requirement relative to the position in the current time image information.

Optionally, the first information enhancement module includes:

Optionally, the second information enhancement module includes:

Optionally, the apparatus further comprises:

Referring to fig. 6, a block diagram of an embodiment of an object detection device based on an object detection model according to another embodiment of the present invention may specifically include the following modules:

an information acquisition module 401, configured to acquire image information sequences acquired by a plurality of cameras and corresponding point cloud information; the image information sequence comprises current moment image information and other moment image information;

a sample selection module 402, configured to select an alternative target sample from the target sample set;

a first sample adding module 403, configured to add the candidate target sample to the current time image information according to the current time image information, obtain enhanced current time image information, and add point cloud information corresponding to the candidate target sample to point cloud information corresponding to the current time;

a second sample adding module 404, configured to add the candidate target sample to the other time image information according to coordinate system related information of the current time and other times, obtain enhanced other time image information, and add point cloud information corresponding to the candidate target sample to point cloud information corresponding to the other times, so that a position of the candidate target sample in the other time image information accords with a time sequence training requirement relative to a position in the current time image information;

A model training module 405, configured to train the target detection model using the enhanced current time image information and enhanced other time image information, the target sample tag data, and the point cloud information; the target detection result output by the target detection module comprises the position information, the length, the width, the height and the speed of the target.

Optionally, the object detection model includes an image feature extraction module, a feature map depth estimation module, an image feature mapping module, and an object detection module, and the model training module includes:

the feature extraction sub-module is used for inputting the image information of the enhanced current moment and the images of the enhanced other moments into the image feature extraction module for feature extraction to obtain corresponding image features, and outputting the image features to the feature image depth estimation module;

and the feature output sub-module is used for inputting the image features and the depth map output by the feature map depth estimation module into the image feature mapping module to obtain corresponding aerial view features, and outputting the aerial view features to the target detection module.

Optionally, the feature map depth estimation module includes a first layer neural network and a second layer neural network, and the apparatus further includes:

The first depth map generation sub-module is used for outputting the image features to the first layer neural network before the depth maps output by the image features and the feature map depth estimation module are input to the image feature mapping module to obtain corresponding aerial view features, and the aerial view features are output to the target detection module to obtain a first depth map output by the first layer neural network;

and the second depth map generation sub-module is used for cascading the characteristic data output by the first depth map and the first layer neural network, inputting the characteristic data into the second layer neural network after cascading the characteristic channels, and obtaining a second depth map output by the second layer neural network as the depth map output by the characteristic map depth estimation module.

Optionally, the target detection model further comprises a time sequence information fusion module, and the target detection result comprises the speed of the target; the feature output submodule includes:

the information input unit is used for inputting the image characteristics, the depth images output by the characteristic image depth estimation module and the coordinate system related information at different moments into the image characteristic mapping module respectively aiming at the image information at each moment to obtain corresponding aerial view characteristics;

And the characteristic output unit is used for inputting the bird's-eye view characteristics corresponding to each moment into the time sequence information fusion module for fusion to obtain fused bird's-eye view characteristics, and outputting the fused bird's-eye view characteristics to the target detection module.

Optionally, the object detection model further includes a feature decoding module, and the feature output submodule includes:

the decoding unit is used for inputting the aerial view features into the feature decoding module for decoding to obtain decoded aerial view features;

and the output unit is used for outputting the decoded aerial view characteristic to the target detection module.

For the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points.

Fig. 7 is a block diagram illustrating an electronic device 700, according to an example embodiment. For example, the electronic device 700 may be a mobile phone, computer, digital broadcast terminal, messaging device, game console, tablet device, medical device, exercise device, personal digital assistant, or the like.

Referring to fig. 7, an electronic device 700 may include one or more of the following components: a processing component 702, a memory 704, a power component 706, a multimedia component 708, an audio component 710, an input/output (I/O) interface 712, a sensor component 714, and a communication component 716.

The processing component 702 generally controls overall operation of the electronic device 700, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing element 702 may include one or more processors 720 to execute instructions to perform all or part of the steps of the methods described above. Further, the processing component 702 can include one or more modules that facilitate interaction between the processing component 702 and other components. For example, the processing component 702 may include a multimedia module to facilitate interaction between the multimedia component 708 and the processing component 702.

Memory 704 is configured to store various types of data to support operations at device 700. Examples of such data include instructions for any application or method operating on the electronic device 700, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 704 may be implemented by any type or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.

The power component 704 provides power to the various components of the electronic device 700. Power component 704 can include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for electronic device 700.

The multimedia component 708 includes a screen between the electronic device 700 and the user that provides an output interface. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may sense not only the boundary of a touch or slide action, but also the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 708 includes a front-facing camera and/or a rear-facing camera. When the electronic device 700 is in an operational mode, such as a shooting mode or a video mode, the front camera and/or the rear camera may receive external multimedia data. Each front camera and rear camera may be a fixed optical lens system or have focal length and optical zoom capabilities.

The audio component 710 is configured to output and/or input audio signals. For example, the audio component 710 includes a Microphone (MIC) configured to receive external audio signals when the electronic device 700 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may be further stored in the memory 704 or transmitted via the communication component 716. In some embodiments, the audio component 710 further includes a speaker for outputting audio signals.

The I/O interface 712 provides an interface between the processing component 702 and peripheral interface modules, which may be a keyboard, click wheel, buttons, etc. These buttons may include, but are not limited to: homepage button, volume button, start button, and lock button.

The sensor assembly 714 includes one or more sensors for providing status assessment of various aspects of the electronic device 700. For example, the sensor assembly 714 may detect an on/off state of the device 700, a relative positioning of the components, such as a display and keypad of the electronic device 700, a change in position of the electronic device 700 or a component of the electronic device 700, the presence or absence of a user's contact with the electronic device 700, an orientation or acceleration/deceleration of the electronic device 700, and a change in temperature of the electronic device 700. The sensor assembly 714 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact. The sensor assembly 714 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 714 may also include an acceleration sensor, a gyroscopic sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 716 is configured to facilitate communication between the electronic device 700 and other devices, either wired or wireless. The electronic device 700 may access a wireless network based on a communication standard, such as WiFi,2G, or 3G, or a combination thereof. In one exemplary embodiment, the communication part 714 receives a broadcast signal or broadcast-related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, the communication component 714 further includes a Near Field Communication (NFC) module to facilitate short range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the electronic device 700 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic elements for executing the methods described above.

In an exemplary embodiment, a non-transitory computer readable storage medium is also provided, such as memory 704, including instructions executable by processor 720 of electronic device 700 to perform the above-described method. For example, the non-transitory computer readable storage medium may be ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.

A non-transitory computer readable storage medium, which when executed by a processor of a terminal, causes the terminal to perform an image information enhancement method, the method comprising:

selecting an alternative target sample from the target sample set;

A non-transitory computer readable storage medium, which when executed by a processor of a terminal, causes the terminal to perform a method of training an object detection model based on image information enhancement, the method comprising:

selecting an alternative target sample from the target sample set;

In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described by differences from other embodiments, and identical and similar parts between the embodiments are all enough to be referred to each other.

It will be apparent to those skilled in the art that embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the invention may take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal device to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal device, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiment and all such alterations and modifications as fall within the scope of the embodiments of the invention.

Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or terminal device comprising the element.

The above description of the present invention provides a method for training a target detection model, a method for training a target detection model based on a target detection model, a device for training a target detection model based on a target detection model, an electronic device, and a readable storage medium, and specific examples are applied to illustrate the principles and embodiments of the present invention, and the above description of the embodiments is only used to help understand the method and core idea of the present invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.

Claims

1. An image information enhancement method, comprising:

selecting an alternative target sample from the target sample set;

2. The method according to claim 1, wherein adding the candidate target samples to the current time image information according to the current time image information, obtaining enhanced current time image information includes:

3. The method according to claim 1, wherein adding the candidate target sample to the other time image information according to the coordinate system related information of the current time and the other time, and obtaining enhanced other time image information includes:

4. The method according to claim 1, wherein before the adding the candidate target sample to the current time image information according to the current time image information, obtaining enhanced current time image information, and adding point cloud information corresponding to the candidate target sample to point cloud information corresponding to the current time, the method further comprises:

5. An image information enhancement-based target detection model training method, which is characterized by comprising the following steps:

selecting an alternative target sample from the target sample set;

6. The method of claim 5, wherein the object detection model comprises an image feature extraction module, a feature map depth estimation module, an image feature mapping module, and an object detection module, wherein training the object detection model using the enhanced current time image information and enhanced other time image information, and object sample tag data and the point cloud information comprises:

7. The method of claim 6, wherein the feature map depth estimation module includes a first layer neural network and a second layer neural network, and wherein before the inputting the image features and the depth maps output by the feature map depth estimation module to the image feature mapping module, obtaining corresponding bird's-eye view features, and outputting the bird's-eye view features to the target detection module, the method further comprises:

8. The method of claim 6, wherein the object detection model further comprises a timing information fusion module, the object detection result comprising a speed of an object; the inputting the image feature and the depth map output by the feature map depth estimation module into the image feature mapping module to obtain a corresponding aerial view feature, and outputting the aerial view feature to the target detection module includes:

9. The method of claim 6, wherein the object detection model further comprises a feature decoding module, the outputting the bird's eye view features to the object detection module comprising:

10. An image information enhancement apparatus, comprising:

11. The apparatus of claim 10, wherein the first information enhancement module comprises:

12. The apparatus of claim 10, wherein the second information enhancement module comprises:

13. The apparatus of claim 10, wherein the apparatus further comprises:

14. An image information enhancement-based object detection model training apparatus, the apparatus comprising:

the first sample adding module is used for adding the alternative target sample into the current time image information according to the current time image information to obtain enhanced current time image information, and adding point cloud information corresponding to the alternative target sample into point cloud information corresponding to the current time;

the second sample adding module is used for adding the alternative target sample into the image information of other time according to the coordinate system related information of the current time and other time to obtain the image information of the enhanced other time, and adding the point cloud information corresponding to the alternative target sample into the point cloud information corresponding to the other time to enable the position of the alternative target sample in the image information of the other time to accord with the time sequence training requirement relative to the position in the image information of the current time;

the model training module is used for training the target detection model by adopting the image information of the enhanced current moment, the image information of the enhanced other moments, the target sample marking data and the point cloud information; the target detection result output by the target detection module comprises the position information, the length, the width, the height and the speed of the target.

15. The apparatus of claim 14, wherein the object detection model comprises an image feature extraction module, a feature map depth estimation module, an image feature mapping module, and an object detection module, the model training module comprising:

16. The apparatus of claim 15, wherein the feature map depth estimation module comprises a first layer neural network and a second layer neural network, the apparatus further comprising:

17. The apparatus of claim 15, wherein the object detection model further comprises a timing information fusion module, the object detection result comprising a speed of an object; the feature output submodule includes:

18. The apparatus of claim 15, wherein the object detection model further comprises a feature decoding module, the feature output submodule comprising:

19. The electronic equipment is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus;

a memory for storing a computer program;

a processor for carrying out the method steps of any one of claims 1-9 when executing a program stored on a memory.

20. A readable storage medium, characterized in that instructions in the storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the method of any one of claims 1-9.