CN115984795A - Image sensing method, computer device, computer-readable storage medium and vehicle - Google Patents

Image sensing method, computer device, computer-readable storage medium and vehicle Download PDF

Info

Publication number
CN115984795A
CN115984795A CN202211724080.6A CN202211724080A CN115984795A CN 115984795 A CN115984795 A CN 115984795A CN 202211724080 A CN202211724080 A CN 202211724080A CN 115984795 A CN115984795 A CN 115984795A
Authority
CN
China
Prior art keywords
image
perception
depth
model
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211724080.6A
Other languages
Chinese (zh)
Inventor
康子健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui Weilai Zhijia Technology Co Ltd
Original Assignee
Anhui Weilai Zhijia Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui Weilai Zhijia Technology Co Ltd filed Critical Anhui Weilai Zhijia Technology Co Ltd
Priority to CN202211724080.6A priority Critical patent/CN115984795A/en
Publication of CN115984795A publication Critical patent/CN115984795A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Image Analysis (AREA)

Abstract

The invention relates to the technical field of automatic driving, in particular to an image sensing method, computer equipment, a computer readable storage medium and a vehicle, and aims to solve the problem of improving the accuracy of image sensing. For the purpose, the image perception method provided by the invention comprises the steps of obtaining image frames acquired by a vehicle, and adopting an image perception model to carry out image target perception on the image frames; the training method of the image perception model comprises the steps of obtaining a teacher model capable of carrying out image depth estimation on image frames, adopting a knowledge distillation method to enable the teacher model to guide the image perception model to carry out pre-training of image depth estimation by using the image frames, and carrying out final training of image target perception on the image perception model which is completed with the pre-training according to the image frames and image target marking information thereof. By the image sensing method, the accuracy of image sensing can be improved, and the robustness of image sensing can be improved.

Description

Image sensing method, computer device, computer-readable storage medium and vehicle
Technical Field
The invention relates to the technical field of automatic driving, in particular to an image sensing method, computer equipment, a computer readable storage medium and a vehicle.
Background
In automatic driving control of a vehicle, images around the vehicle are usually acquired through a vision sensor, and then the images are perceived through a perception model to identify information of target objects around the vehicle, such as lane lines, traffic signs, pedestrians, and obstacles. However, the conventional image sensing method cannot obtain more accurate depth information when identifying a target object, so that the accuracy of image sensing is reduced.
Accordingly, there is a need in the art for a new solution to the above problems.
Disclosure of Invention
In order to overcome the above-mentioned drawbacks, the present invention has been made to provide an image sensing method, a computer device, a computer-readable storage medium, and a vehicle that solve or at least partially solve the technical problem of how to improve the accuracy of image sensing.
In a first aspect, there is provided a method of image perception, the method comprising:
acquiring image frames acquired by a vehicle;
adopting an image perception model to carry out image target perception on the image frame;
wherein the image perception model is obtained by training in the following way:
acquiring a teacher model capable of carrying out image depth estimation on image frames;
a knowledge distillation method is adopted, so that a teacher model guides the image perception model to pre-train image depth estimation by using image frames;
and finally training the image target perception of the image perception model which is trained in advance according to the image frame and the image target marking information thereof.
In one technical solution of the above image sensing method, the step of "using a knowledge distillation method to make a teacher model guide the image sensing model to perform pre-training of image depth estimation using image frames" specifically includes:
estimating the depth and the confidence coefficient of each pixel point position on the image frame by adopting a teacher model;
acquiring the position of a pixel point of which the depth confidence coefficient is greater than a preset confidence coefficient threshold;
acquiring a high-confidence-level area on the image frame according to the position of the pixel point;
and a knowledge distillation method is adopted, so that a teacher model guides the image perception model to pre-train the image depth estimation by using a high-confidence-degree area on an image frame.
In one technical solution of the above image sensing method, the step of "acquiring a high-confidence region on the image frame according to the pixel point position" specifically includes:
generating an image mask of a high-confidence-degree area according to the position of the pixel point;
and acquiring a high-confidence-degree area on the image frame according to the image mask.
In one technical solution of the image sensing method, the step of estimating the depth and the confidence thereof at each pixel position on the image frame by using the teacher model specifically includes:
if the teacher model is obtained by carrying out image depth estimation training according to the image frame and the non-dense depth marking information thereof, estimating the depth of each pixel point position on the image frame by adopting the teacher model, and estimating the probability of each pixel point on the image frame scanned by a vehicle radar;
and respectively determining the confidence coefficient of the depth of each pixel point position according to the probability.
In one technical solution of the image sensing method, the step of estimating the depth and the confidence thereof at each pixel position on the image frame by using the teacher model specifically includes:
if the teacher model is obtained by carrying out image depth estimation training according to the image frame and the dense depth annotation information thereof, estimating the depth of each pixel point position on the image frame by adopting the teacher model;
based on an uncertainty estimation method, estimating the uncertainty of the depth at each pixel point position;
and respectively determining the confidence of the depth of each pixel point position according to the uncertainty.
In an embodiment of the above image perception method, before the step of "using a knowledge distillation method to enable a teacher model to guide the image perception model to perform pre-training of image depth estimation using image frames", the method further includes constructing the image perception model by:
respectively constructing a feature extraction network, a target perception network and a depth estimation network to form the image perception model;
the feature extraction network is used for extracting image features of image frames, the target perception network is used for image target perception according to the image features, and the depth estimation network is used for estimating the depth of each pixel point position on the image frames according to the image features.
In one technical solution of the above image sensing method, the step of "using a knowledge distillation method to make a teacher model guide the image sensing model to perform pre-training of image depth estimation using image frames" specifically includes:
a knowledge distillation method is adopted, so that a teacher model guides the image perception model to pre-train image depth estimation by using image frames;
removing the depth estimation network after pre-training is completed.
In a technical solution of the above image sensing method, the method further includes constructing the feature extraction network and the target sensing network by:
constructing a feature pyramid network to form the feature extraction network;
and constructing a plurality of target perception networks, wherein each target perception network is respectively used for perceiving different types of image targets according to the image features extracted by the feature pyramid network.
In a second aspect, there is provided a computer device comprising a processor and a storage means adapted to store a plurality of program codes, the program codes being adapted to be loaded and run by the processor to perform the image perception method according to any of the above-mentioned aspects of the image perception method.
In a third aspect, a computer-readable storage medium is provided, having stored therein a plurality of program codes adapted to be loaded and run by a processor to perform the image sensing method according to any one of the above-mentioned aspects of the image sensing method.
In a fourth aspect, a vehicle is provided, comprising a computer device according to the above-mentioned technical solution.
Scheme 1. An image perception method, characterized in that the method comprises:
acquiring an image frame acquired by a vehicle;
adopting an image perception model to carry out image target perception on the image frame;
wherein the image perception model is obtained by training in the following way:
acquiring a teacher model capable of carrying out image depth estimation on image frames;
a knowledge distillation method is adopted, so that a teacher model guides the image perception model to pre-train image depth estimation by using image frames;
and finally training the image target perception of the image perception model which is subjected to pre-training according to the image frame and the image target marking information thereof.
Scheme 2. The image perception method according to scheme 1, wherein the step of "using a knowledge distillation method to enable a teacher model to guide the image perception model to perform pre-training of image depth estimation using image frames" specifically comprises:
estimating the depth and the confidence of each pixel point position on the image frame by adopting a teacher model;
acquiring the position of a pixel point of which the depth confidence coefficient is greater than a preset confidence coefficient threshold;
acquiring a high-confidence-degree area on the image frame according to the position of the pixel point;
and a knowledge distillation method is adopted, so that a teacher model guides the image perception model to pre-train the image depth estimation by using a high-confidence-degree area on an image frame.
Scheme 3. The image sensing method according to scheme 2, wherein the step of "obtaining a high-confidence region on the image frame according to the pixel point position" specifically includes:
generating an image mask of a high-confidence area according to the position of the pixel point;
and acquiring a high-confidence-level area on the image frame according to the image mask.
Scheme 4. The image sensing method according to scheme 2, wherein the step of estimating the depth and the confidence thereof at each pixel position on the image frame by using the teacher model specifically comprises:
if the teacher model is obtained by carrying out image depth estimation training according to the image frame and the non-dense depth marking information thereof, estimating the depth of each pixel point position on the image frame by adopting the teacher model, and estimating the probability of each pixel point on the image frame scanned by a vehicle radar;
and respectively determining the confidence of the depth of each pixel point position according to the probability.
Scheme 5. The image sensing method according to scheme 2, wherein the step of "estimating depth and confidence thereof at each pixel position on the image frame using the teacher model" specifically includes:
if the teacher model is obtained by carrying out image depth estimation training according to the image frame and the dense depth annotation information thereof, estimating the depth of each pixel point position on the image frame by adopting the teacher model;
based on an uncertainty estimation method, estimating the uncertainty of the depth at each pixel point position;
and respectively determining the confidence of the depth of each pixel point position according to the uncertainty.
Scheme 6. The image perception method according to scheme 1, wherein prior to the step of "using a knowledge distillation method to enable a teacher model to guide the image perception model to pre-training for image depth estimation using image frames", the method further comprises constructing an image perception model by:
respectively constructing a feature extraction network, a target perception network and a depth estimation network to form the image perception model;
the feature extraction network is used for extracting image features of image frames, the target perception network is used for image target perception according to the image features, and the depth estimation network is used for estimating the depth of each pixel point position on the image frames according to the image features.
Scheme 7. The image sensing method according to scheme 6, wherein the step of "using a knowledge distillation method to make a teacher model guide the image sensing model to perform pre-training of image depth estimation using image frames" specifically comprises:
a knowledge distillation method is adopted, so that a teacher model guides the image perception model to pre-train image depth estimation by using image frames;
removing the depth estimation network after pre-training is completed.
Scheme 8. The image sensing method according to scheme 6, further comprising constructing the feature extraction network and the target sensing network by:
constructing a feature pyramid network to form the feature extraction network;
and constructing a plurality of target perception networks, wherein each target perception network is respectively used for perceiving different types of image targets according to the image features extracted by the feature pyramid network.
Solution 9. A computer device comprising a processor and a storage means adapted to store a plurality of program codes, characterized in that said program codes are adapted to be loaded and run by said processor to perform the image perception method of any of the solutions 1 to 8.
Scheme 10. A computer readable storage medium having a plurality of program codes stored therein, wherein the program codes are adapted to be loaded and run by a processor to perform the image sensing method of any of schemes 1 to 8.
The vehicle of claim 11, characterized in that the vehicle comprises the computer device of claim 9.
One or more technical schemes of the invention at least have one or more of the following beneficial effects:
in the technical scheme for implementing the image perception method provided by the invention, a teacher model capable of carrying out image depth estimation on image frames can be obtained, a knowledge distillation method is adopted to enable the teacher model to guide the image perception model to carry out pre-training of image depth estimation by using the image frames, and then the image perception model which is subjected to the pre-training is subjected to final training of image target perception according to the image frames. And when image perception is needed to be carried out on the image frames acquired by the vehicle, the trained image perception model is adopted to carry out image target perception on the image frames.
By the method, the image depth estimation capability of the image perception model can be improved by the teacher model on the premise of not carrying out depth information annotation on the image frames, so that the trained model can be used for accurately identifying the target on the image frames after the final image target perception training is carried out on the model with the image depth estimation by the image frames. Because the model has the depth estimation capability, both a two-dimensional image target and a three-dimensional image target can be accurately identified, namely the perceived robustness of the image target is improved.
Drawings
The disclosure of the present invention will become more readily understood with reference to the accompanying drawings. As is readily understood by those skilled in the art: these drawings are for illustrative purposes only and are not intended to constitute a limitation on the scope of the present invention. Wherein:
FIG. 1 is a flow diagram illustrating the main steps of an image sensing method according to an embodiment of the present invention;
FIG. 2 is a flow chart illustrating the main steps of a method for training an image perception model according to an embodiment of the present invention;
FIG. 3 is a network architecture diagram of an image perception model according to one embodiment of the present invention;
FIG. 4 is a schematic diagram of a network architecture of a teacher model according to one embodiment of the invention;
fig. 5 is a main configuration diagram of a computer apparatus according to an embodiment of the present invention.
Detailed Description
Some embodiments of the invention are described below with reference to the accompanying drawings. It should be understood by those skilled in the art that these embodiments are only for explaining the technical principle of the present invention, and are not intended to limit the scope of the present invention.
In the description of the present invention, a "processor" may include hardware, software, or a combination of both. The processor may be a central processing unit, microprocessor, image processor, digital signal processor, or any other suitable processor. The processor has data and/or signal processing functionality. The processor may be implemented in software, hardware, or a combination thereof. The computer readable storage medium includes any suitable medium that can store program code, such as magnetic disks, hard disks, optical disks, flash memory, read-only memory, random-access memory, and the like.
The following describes an embodiment of an image sensing method provided by the present invention.
Referring to fig. 1, fig. 1 is a flow chart illustrating the main steps of an image sensing method according to an embodiment of the invention. As shown in fig. 1, the image sensing method in the embodiment of the present invention mainly includes the following steps S101 to S102.
Step S101: and acquiring image frames acquired by the vehicle.
In the embodiment of the invention, the image frames can be acquired by using the image acquisition device on the vehicle in the process that the vehicle runs along the map acquisition path.
Step S102: and adopting an image perception model to carry out image target perception on the image frame.
Before this step, a training step (S100) of the image perception model may be further included to train the image perception model. Specifically, in the embodiment of the present invention, the image perception model may be obtained by training through the following steps S1001 to S1003 shown in fig. 2.
Step S1001: a teacher model capable of image depth estimation of image frames is obtained. The teacher model is a model which adopts the image frames and the depth marking information thereof to carry out image depth estimation training. Wherein the depth annotation information comprises a depth at a pixel point position on the image frame.
Step S1002: and a knowledge distillation method is adopted, so that the teacher model guides the image perception model to pre-train the image depth estimation by using the image frames. That is to say, the image perception model is used as a student model, and the knowledge distillation method is adopted to distill the image depth estimation capability of the teacher model onto the image perception model, so that the image perception model also has the image depth estimation capability.
Step S1003: and finally training the image target perception of the image perception model which is subjected to the pre-training according to the image frame and the image target marking information thereof.
The image target labeling information is labeled in advance according to the image frame, for example, the image target included in the image frame can be labeled in a manual labeling mode according to the image target required to be sensed to form the image target labeling information. The embodiment of the invention does not specifically limit the method for acquiring the image target labeling information.
The image perception model can have the image target perception capability through the final training, and the image perception model has the image depth estimation capability after the pre-training, so that the image perception model can be used for carrying out two-dimensional image target perception and three-dimensional image target perception after the final training is finished, and the robustness of the image target perception is improved.
Therefore, based on the methods described in the above steps S101 to S102, not only the accuracy of the image sensing model can be improved, but also the robustness of image target sensing can be improved.
The following describes steps S1002 to S1003 further.
1. Step S1002 will be explained.
In practical application, the depth of the teacher model at each pixel position on the image frame may not be accurately estimated due to the limitation of the accuracy of the image frame depth annotation information adopted during training, for example, if the depth annotation information at a part of pixel positions on the image frame is inaccurate, the depth estimation of the teacher model at a part of pixel positions will be affected. In order to ensure the image depth estimation capability of the image perception model, when the teacher model is adopted to guide the image perception model to train, the depth of the accurate pixel point position obtained by the teacher model can be used to guide the image perception model to train. Specifically, the image perception model may be pre-trained for image depth estimation in some embodiments through the following steps 11 to 14.
Step 11: and estimating the depth and the confidence of each pixel point position on the image frame by adopting a teacher model. The confidence of the depth indicates the confidence level of the depth, and may also indicate the accuracy of the confidence level, wherein the higher the confidence level, the higher the accuracy, and the lower the accuracy otherwise.
Step 12: and acquiring the pixel point position of which the depth confidence coefficient is greater than a preset confidence coefficient threshold value. If the confidence coefficient is greater than a preset confidence coefficient threshold value, the confidence coefficient is high in accuracy, and the image perception model can be guided to conduct image depth estimation training.
Those skilled in the art may flexibly set the specific size of the preset confidence threshold according to actual requirements, which is not specifically limited in the embodiment of the present invention.
Step 13: and acquiring a high-confidence-degree area on the image frame according to the position of the pixel point with the depth confidence degree larger than the preset confidence-degree threshold value. Specifically, the region formed by the positions of these pixel points may be used as a high-confidence region.
Step 14: and a knowledge distillation method is adopted, so that the teacher model guides the image perception model to use a high-confidence-degree area on the image frame for pre-training of image depth estimation. In other words, the knowledge distillation method is adopted to distill the capability of the teacher model for carrying out image depth estimation on the high-confidence-level area to the image perception model so as to improve the image depth estimation capability of the image perception model.
Based on the method described in the above steps 11 to 14, the problem that the depth estimation capability of the image perception model is affected due to inaccurate depth estimation of the teacher model on a part of the image frame region can be avoided, and the image perception model is ensured to have higher image depth estimation capability.
The following will further describe step 11 and step 13.
1. Step 11 will be explained.
In practical applications, for image frames of different scenes, dense depth labeling information may be labeled for the image frames, and non-dense depth labeling information may also be labeled for the image frames. For example, for an image frame of an indoor scene, in order to ensure accuracy of depth estimation, a depth may be labeled at each pixel position on the image frame to form dense depth labeling information; for an image frame of an open outdoor scene, in order to ensure the efficiency of depth estimation, depths may be labeled only at a part of pixel positions on the image frame to form non-dense depth labeling information.
For the teacher model obtained by training with dense depth labeling information and the teacher model obtained by training with non-dense depth labeling information, different methods can be respectively adopted to obtain the confidence of the depth estimated by the teacher model. These two cases will be described separately below.
(1) And the teacher model is obtained by carrying out image depth estimation training according to the image frames and the non-dense depth marking information thereof.
In this case, after the image frame is input to the teacher model, the teacher model may be used to estimate the depth of each pixel point position on the image frame, and estimate the probability that each pixel point on the image frame is scanned by the vehicle radar, and then determine the confidence of the depth of each pixel point position according to the probability. The probability and the confidence degree form a positive correlation relationship, that is, the higher the probability, the higher the confidence degree, and otherwise, the lower the confidence degree, and the probability can be directly used as the confidence degree. It should be noted that, while the image frames are acquired by the image acquisition device on the vehicle, the vehicle radar scans the environment, so that the confidence level can be obtained by estimating the probability that each pixel point is scanned by the vehicle radar.
(2) And the teacher model is obtained by carrying out image depth estimation training according to the image frames and the dense depth marking information thereof.
In this case, after the image frame is input to the teacher model, the depth of each pixel point position on the image frame may be estimated by using the teacher model, and based on an Uncertainty (uncertaintiy) estimation method, the Uncertainty of the depth of each pixel point position is estimated, and the confidence of the depth of each pixel point position is determined according to the Uncertainty.
The uncertainty and the confidence degree are in a negative correlation relationship, the higher the uncertainty, the lower the confidence degree, and otherwise, the higher the confidence degree, an inversion operation may be performed on the uncertainty, for example, an inverse of the uncertainty is calculated, and a result of the inversion operation is used as the confidence degree.
In the embodiment of the present invention, a conventional uncertainty estimation method may be used to estimate the uncertainty of the depth at the position of each pixel point, which is not specifically limited in the embodiment of the present invention. For example, the Uncertainty estimation method disclosed in the article entitled "Bounding Box Regression With Uncertainty for Accurate Object Detection" may be used, i.e., the depth confidence is simplified and modeled as a Gaussian distribution, and the standard deviation in the Gaussian distribution function represents the Uncertainty of the depth, and the Uncertainty is very high when the standard deviation approaches 0.
2. Step 13 will be explained.
In order to quickly and accurately obtain the high-confidence region on the image frame according to the pixel point position, in some embodiments, an image Mask (Mask) of the high-confidence region may be generated according to the pixel point position, and then the high-confidence region on the image frame may be obtained according to the image Mask.
In the embodiment of the invention, a conventional image mask generation method can be adopted to generate the image mask of the high-confidence-degree area according to the position of the pixel point, and the embodiment of the invention does not specifically limit the image mask generation method.
The above is the description of step 11 and step 13.
In order to train the image depth estimation of the image perception model and enable the image perception model to have the image depth estimation capability, a depth estimation network can be arranged besides a feature extraction network and a target perception network when the image perception model is constructed.
The feature extraction network may be configured to extract image features of the image frame, the target sensing network may be configured to perform image target sensing according to the image features, and the depth estimation network may be configured to estimate a depth at a position of each pixel point on the image frame according to the image features.
After the image frame is input into the image perception model, the depth estimation network may estimate the depth at each pixel point position on the image frame according to the image features of the image frame, and for convenience of description, the depth estimated by the depth estimation network is referred to as student estimated depth. After the image frame is input to the teacher model, the teacher model may also estimate the depth at each pixel position on the image frame, and for convenience of description, the depth estimated by the teacher model is referred to as the teacher estimated depth. After the student estimated depth and the teacher estimated depth are obtained, the model loss value of the image perception model can be calculated according to the student estimated depth and the teacher estimated depth, the parameter gradient of the model parameters is calculated according to the model loss value, the model parameters are updated according to the parameter gradient in a back propagation mode until the image perception model meets the convergence condition, the training is stopped, namely, the knowledge distillation training is completed, and the image depth estimation capability of the teacher model is distilled to the image perception model.
After pre-training of the image depth estimation is completed, the depth estimation network may be removed. That is, when the image perception model is finally trained for image target perception, the image perception model no longer includes the depth estimation network.
In practical application, a plurality of image capturing devices with different viewing angles may be disposed on a vehicle, and sensing may be performed according to image frames with different viewing angles obtained by the image capturing devices when sensing an image target. In order to ensure that the image perception model has a high image depth estimation capability for the image frames of each view angle, the image frames of each view angle can be trained respectively during image depth estimation training. For example, the vehicle is provided with 2 image acquisition devices a and B with different viewing angles, image frames acquired by the image acquisition device a are acquired first, and the image frames are used for carrying out image depth estimation training on an image perception model. Then, image frames acquired by the image acquisition device B are acquired, and the image depth estimation training is carried out on the image perception model by utilizing the image frames.
2. Step S1003 will be described.
According to the foregoing description, when performing image target perception training, the image perception model mainly includes a feature extraction network and a target perception network, and does not include a depth estimation network, and the training mainly uses the target perception network to perform image target perception training.
After the image frame and the image target marking information thereof are input into the image perception model, the target perception network can sense the image target according to the image characteristics of the image frame to obtain the image target perception information, further, the model loss value of the image perception model can be calculated according to the image target marking information and the image target perception information, the parameter gradient of the model parameter is calculated according to the model loss value, the model parameter is updated according to the parameter gradient in a reverse propagation mode, and the training is stopped until the image perception model meets the convergence condition.
In order to improve the robustness of the image perception model for perception of the image target, a plurality of target perception models can be arranged in the image target perception model, and each target perception network is respectively used for perceiving different types of image targets according to the image characteristics of the image frame. For example, a 2D target-aware network, a 3D target-aware network, and a BEV (Bird's Eye View) target-aware network, etc. may be set. In addition, in order to further improve the robustness of image target perception, a feature extraction network can be constructed by using the feature pyramid network, image features of different scales can be extracted from an image frame by using the feature pyramid network, and then the image target perception can be respectively carried out according to the image features of each scale.
As shown in fig. 3, the feature extraction network in the image sensing model mainly includes a Backbone network (Backbone network) and a feature pyramid network, the feature pyramid network is connected to three multitask networks, and the feature pyramid network can extract three image features of different scales and input the image features of each scale to one multitask network respectively. The multitask network comprises a depth estimation network and a target-aware network, wherein the target-aware network comprises a 2D target-aware network, a 3D target-aware network and a BEV target-aware network. When pre-training image depth estimation is carried out on the image perception model, a depth estimation network in a multi-task network is mainly used for training, and the depth estimation network is removed after the pre-training is finished; when the final training of image target perception is carried out on the image perception model, the training is mainly carried out by utilizing a target perception network. The input data of the BEV target perception network is not the original image features extracted by the feature pyramid network, but the BEV image features formed by processing the original image features.
Because the teacher model is adopted to perform distillation training when the image depth estimation pre-training is performed on the image perception model, in order to ensure the training accuracy, the scale of the image features extracted by the teacher model from the upper frame of the image needs to be consistent with the scale of the image features extracted by the image perception model. In contrast, in the embodiment of the present invention, if the feature pyramid network is used to construct the feature extraction network of the image perception model, the feature pyramid network may also be used to construct the feature extraction network of the teacher model, and it is ensured that the scales of the image features extracted by the two feature pyramid networks are the same.
As shown in fig. 4, the feature extraction network in the teacher model mainly includes a Backbone network (Backbone) and a feature pyramid network, the feature pyramid network is connected to two teacher network depth prediction networks, and the feature pyramid network can extract two image features of different scales and input the image features of each scale to one teacher network depth prediction network. The teacher network depth prediction network includes a depth estimation network and a depth confidence estimation network, the depth estimation network may be used to estimate a depth at each pixel location on the image frame, and the depth confidence estimation network may be used to estimate a confidence of the depth at each pixel location on the image frame. The method for estimating the confidence is the same as the method described in step 11 in the foregoing method embodiment, and is not described herein again.
In the embodiment of the invention, the performance index of the image perception model subjected to image depth estimation pre-training is obviously improved compared with the image perception model not subjected to image depth estimation pre-training. For example, the AP (Average Precision) index for BEV target perception using an image perception model not pre-trained for image depth estimation is 34.0, while the AP index for an image perception model pre-trained for image depth estimation is 37.6, which is a 10.6 improvement.
It should be noted that, although the foregoing embodiments describe each step in a specific sequence, those skilled in the art will understand that, in order to achieve the effect of the present invention, different steps do not necessarily need to be executed in such a sequence, and they may be executed simultaneously (in parallel) or in other sequences, and these changes are all within the protection scope of the present invention.
It will be understood by those skilled in the art that all or part of the flow of the method of the above-described embodiment may be implemented by a computer program, which may be stored in a computer-readable storage medium, and the steps of the method embodiments may be implemented when the computer program is executed by a processor. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable storage medium may include: any entity or device capable of carrying said computer program code, media, usb disk, removable hard disk, magnetic diskette, optical disk, computer memory, read-only memory, random access memory, electrical carrier wave signals, telecommunication signals, software distribution media, etc. It should be noted that the computer readable storage medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable storage media that does not include electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.
Furthermore, the invention also provides computer equipment.
Referring to fig. 5, fig. 5 is a schematic diagram of the main structure of an embodiment of a computer apparatus according to the present invention. As shown in fig. 5, the computer device in the embodiment of the present invention mainly includes a storage device and a processor, the storage device may be configured to store a program for executing the image sensing method of the above-mentioned method embodiment, and the processor may be configured to execute the program in the storage device, which includes but is not limited to the program for executing the image sensing method of the above-mentioned method embodiment. For convenience of explanation, only the parts related to the embodiments of the present invention are shown, and details of the specific techniques are not disclosed.
The computer device in the embodiment of the present invention may be a control apparatus device formed including various electronic devices. In some possible implementations, a computer device may include multiple storage devices and multiple processors. The program for executing the image sensing method of the above method embodiment may be divided into a plurality of sub-programs, and each sub-program may be loaded and executed by a processor to perform different steps of the image sensing method of the above method embodiment. Specifically, each sub program may be stored in a different storage device, and each processor may be configured to execute the programs in one or more storage devices to implement the image sensing method of the above method embodiment together, that is, each processor executes different steps of the image sensing method of the above method embodiment to implement the image sensing method of the above method embodiment together.
The multiple processors may be processors disposed on the same device, for example, the computer device may be a high-performance device composed of multiple processors, and the multiple processors may be processors configured on the high-performance device. Further, the plurality of processors may be processors disposed on different devices.
Further, the invention also provides a computer readable storage medium.
In an embodiment of a computer-readable storage medium according to the present invention, the computer-readable storage medium may be configured to store a program for executing the image sensing method of the above-described method embodiment, which may be loaded and executed by a processor to implement the above-described image sensing method. For convenience of explanation, only the parts related to the embodiments of the present invention are shown, and details of the specific techniques are not disclosed. The computer readable storage medium may be a storage device formed by including various electronic devices, and optionally, the computer readable storage medium is a non-transitory computer readable storage medium in the embodiment of the present invention.
Further, the invention also provides a vehicle.
In an embodiment of a vehicle according to the invention, the vehicle may comprise a computer device as described above for the embodiment of the computer device. The vehicle may be an autonomous vehicle, an unmanned vehicle, or the like in the present embodiment. In addition, according to the power source type, the vehicle in the embodiment may be a fuel vehicle, an electric vehicle, a hybrid vehicle in which electric energy is mixed with fuel, or a vehicle using other new energy, and the like.
So far, the technical solution of the present invention has been described in conjunction with one embodiment shown in the accompanying drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.

Claims (10)

1. A method of image perception, the method comprising:
acquiring image frames acquired by a vehicle;
adopting an image perception model to carry out image target perception on the image frame;
wherein the image perception model is obtained by training in the following way:
acquiring a teacher model capable of carrying out image depth estimation on image frames;
a knowledge distillation method is adopted, so that a teacher model guides the image perception model to pre-train image depth estimation by using image frames;
and finally training the image target perception of the image perception model which is trained in advance according to the image frame and the image target marking information thereof.
2. The image perception method according to claim 1, wherein the step of enabling a teacher model to guide the image perception model to perform pre-training of image depth estimation using image frames by using a knowledge distillation method specifically comprises:
estimating the depth and the confidence of each pixel point position on the image frame by adopting a teacher model;
acquiring the position of a pixel point of which the depth confidence coefficient is greater than a preset confidence coefficient threshold;
acquiring a high-confidence-degree area on the image frame according to the position of the pixel point;
and a knowledge distillation method is adopted, so that a teacher model guides the image perception model to pre-train the image depth estimation by using a high-confidence-degree area on an image frame.
3. The image sensing method according to claim 2, wherein the step of obtaining a high-confidence region on the image frame according to the pixel point position specifically comprises:
generating an image mask of a high-confidence-degree area according to the position of the pixel point;
and acquiring a high-confidence-degree area on the image frame according to the image mask.
4. The image sensing method of claim 2, wherein the step of estimating the depth and the confidence thereof at each pixel position on the image frame using the teacher model specifically comprises:
if the teacher model is obtained by carrying out image depth estimation training according to the image frame and the non-dense depth annotation information thereof, estimating the depth of each pixel point position on the image frame by adopting the teacher model, and estimating the probability of each pixel point on the image frame scanned by a vehicle radar;
and respectively determining the confidence of the depth of each pixel point position according to the probability.
5. The image sensing method of claim 2, wherein the step of estimating the depth and the confidence thereof at each pixel position on the image frame using the teacher model specifically comprises:
if the teacher model is obtained by carrying out image depth estimation training according to the image frame and the dense depth annotation information thereof, estimating the depth of each pixel point position on the image frame by adopting the teacher model;
based on an uncertainty estimation method, estimating the uncertainty of the depth at each pixel point position;
and respectively determining the confidence coefficient of the depth of each pixel point position according to the uncertainty.
6. The image perception method according to claim 1, wherein, prior to the step of "having a teacher model instruct the image perception model to perform pre-training of image depth estimation using image frames using a knowledge distillation method", the method further comprises constructing the image perception model by:
respectively constructing a feature extraction network, a target perception network and a depth estimation network to form the image perception model;
the feature extraction network is used for extracting image features of image frames, the target perception network is used for image target perception according to the image features, and the depth estimation network is used for estimating the depth of each pixel point position on the image frames according to the image features.
7. The image perception method according to claim 6, wherein the step of enabling the teacher model to instruct the image perception model to perform pre-training of image depth estimation using the image frames by using a knowledge distillation method specifically includes:
a knowledge distillation method is adopted, so that a teacher model guides the image perception model to pre-train image depth estimation by using image frames;
removing the depth estimation network after pre-training is completed.
8. The image perception method according to claim 6, further comprising constructing the feature extraction network and the target perception network by:
constructing a feature pyramid network to form the feature extraction network;
and constructing a plurality of target perception networks, wherein each target perception network is respectively used for perceiving different types of image targets according to the image features extracted by the feature pyramid network.
9. A computer device comprising a processor and a storage means adapted to store a plurality of program codes, characterized in that said program codes are adapted to be loaded and run by said processor to perform the image perception method according to any of claims 1 to 8.
10. A computer-readable storage medium having stored therein a plurality of program codes, characterized in that the program codes are adapted to be loaded and run by a processor to perform the image perception method according to any of claims 1 to 8.
CN202211724080.6A 2022-12-30 2022-12-30 Image sensing method, computer device, computer-readable storage medium and vehicle Pending CN115984795A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211724080.6A CN115984795A (en) 2022-12-30 2022-12-30 Image sensing method, computer device, computer-readable storage medium and vehicle

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211724080.6A CN115984795A (en) 2022-12-30 2022-12-30 Image sensing method, computer device, computer-readable storage medium and vehicle

Publications (1)

Publication Number Publication Date
CN115984795A true CN115984795A (en) 2023-04-18

Family

ID=85966400

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211724080.6A Pending CN115984795A (en) 2022-12-30 2022-12-30 Image sensing method, computer device, computer-readable storage medium and vehicle

Country Status (1)

Country Link
CN (1) CN115984795A (en)

Similar Documents

Publication Publication Date Title
CN112419494B (en) Obstacle detection and marking method and device for automatic driving and storage medium
EP2772883A2 (en) Road-surface height shape estimation method and system
JP2021531582A (en) Image parallax estimation
US10950032B2 (en) Object capture coverage evaluation
KR102223484B1 (en) System and method for 3D model generation of cut slopes without vegetation
KR102362470B1 (en) Mehtod and apparatus for processing foot information
CN114842449A (en) Target detection method, electronic device, medium, and vehicle
CN114556445A (en) Object recognition method, device, movable platform and storage medium
CN115909268A (en) Dynamic obstacle detection method and device
CN115937442A (en) Road surface reconstruction method based on implicit neural expression, vehicle and storage medium
CN116052100A (en) Image sensing method, computer device, computer-readable storage medium, and vehicle
CN105335959A (en) Quick focusing method and device for imaging apparatus
WO2024001804A1 (en) Three-dimensional object detection method, computer device, storage medium, and vehicle
US20230368407A1 (en) Drivable area detection method, computer device, storage medium, and vehicle
CN112818932A (en) Image processing method, obstacle detection device, medium, and vehicle
CN115984795A (en) Image sensing method, computer device, computer-readable storage medium and vehicle
CN117036442A (en) Robust monocular depth completion method, system and storage medium
CN115661394A (en) Method for constructing lane line map, computer device and storage medium
CN116229448A (en) Three-dimensional target detection method, device, equipment and readable storage medium
CN116129318A (en) Unsupervised monocular three-dimensional target detection method based on video sequence and pre-training instance segmentation
CN114359891A (en) Three-dimensional vehicle detection method, system, device and medium
US10896333B2 (en) Method and device for aiding the navigation of a vehicle
KR102461980B1 (en) Method for producing three-dimensional map
CN115063594B (en) Feature extraction method and device based on automatic driving
CN116558540B (en) Model training method and device, and track generating method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination