CN115690416A

CN115690416A - Knowledge distillation-based BEV semantic segmentation model training method, system, equipment and medium

Info

Publication number: CN115690416A
Application number: CN202211340027.6A
Authority: CN
Inventors: 漆昇翔
Original assignee: Chongqing Changan Automobile Co Ltd
Current assignee: Chongqing Changan Automobile Co Ltd
Priority date: 2022-10-29
Filing date: 2022-10-29
Publication date: 2023-02-03

Abstract

The invention belongs to the field of automatic driving, and particularly relates to a knowledge distillation-based training method for a BEV semantic segmentation model, which comprises the following steps of: acquiring an image dataset; inputting the image data set into a student model for feature extraction to obtain a first plane view data set; inputting the image data set into a teacher model for feature extraction to obtain a second plane view data set; converting the first plane view data set and the second plane view data set according to the space coordinates of the image data in the image data set to obtain a first BEV view data set and a second BEV view data set; performing feature extraction and prediction probability output on the first BEV view data set and the second BEV view data set to obtain a first probability distribution set and a second probability distribution set; and optimizing the student model according to the first probability distribution set and the second probability distribution set, and taking the optimized student model as a target semantic segmentation model.

Description

Knowledge distillation-based BEV semantic segmentation model training method, system, equipment and medium

Technical Field

The invention belongs to the field of automatic driving, and particularly relates to a knowledge distillation-based training method, a knowledge distillation-based training system, knowledge distillation-based training equipment and knowledge distillation-based training media for a BEV semantic segmentation model.

Background

With the wide popularization of vehicle-mounted cameras in modern vehicle equipment, the detection and identification of various targets encountered in the vehicle driving process based on the vehicle-mounted vision sensor become mature day by day, the function of the vehicle-mounted vision sensor shows important value for current auxiliary driving or future automatic driving, and the vehicle-mounted vision sensor becomes an indispensable core technology for high-end intellectualization of modern automobiles. The multi-camera fusion is to fuse vehicle-mounted multi-View camera images, a Bird Eye View (BEV) of a top View is constructed from top to bottom with a vehicle coordinate system as a central layout, the range covers the area with the width of tens of meters to hundreds of meters around the vehicle, and the multi-camera fusion is used for facilitating a driver to acquire the environment around the vehicle from the Bird Eye View through View conversion, so that the control capability of the vehicle is improved.

The BEV perception technology is widely concerned and rapidly developed in the field in recent years, particularly in 2020, most BEV perception algorithm models are required to be complex in network structure, complex in training and labeling process, low in timeliness and high in calculation and storage cost when oriented to multi-shot data processing, so that a mainstream calculation frame which has sufficient influence and meets real-time application of a vehicle end is not formed until now, the model structure is still in high-speed iteration and continuous subversion, and various problems of light weight, high performance, high efficiency and the like are still solved from the mature application.

The prior art comprises the steps of extracting the characteristics of each voxel based on laser radar point cloud data, and accelerating point cloud network semantic segmentation. However, the method has high replaceability, and various networks can be used for replacing the simplified PointNet for acceleration. Meanwhile, the camera is wider in application, high in economical efficiency and high in stability compared with a laser radar; and the bird's-eye view semantic segmentation based on multi-camera fusion has higher application value.

Disclosure of Invention

In view of the defects of the prior art, the invention provides a knowledge distillation-based BEV semantic segmentation model training method, which solves the technical problem that no knowledge distillation-based multi-shot fusion bird-eye view semantic segmentation research exists in the industry at present, and the knowledge distillation algorithm is simple and efficient, is a lossless model optimization method, and can obtain a more accurate semantic segmentation model.

The invention provides a knowledge distillation-based training method for a BEV semantic segmentation model, which comprises the following steps of: acquiring an image dataset; inputting the image data set into a student model for feature extraction to obtain a first plane view data set; inputting the image data set into a teacher model for feature extraction to obtain a second plane view data set; converting the first plane view data set and the second plane view data set according to the space coordinates of the image data in the image data set to obtain a first BEV view data set and a second BEV view data set; performing feature extraction and prediction probability output on the first BEV view data set and the second BEV view data set to obtain a first probability distribution set and a second probability distribution set; and optimizing the student model according to the first probability distribution set and the second probability distribution set, and taking the optimized student model as a target semantic segmentation model.

According to a specific embodiment of the present invention, the student model adopts a Resnet18 neural network model; the teacher model adopts a Resnet101 neural network model.

According to an embodiment of the present invention, the step of performing feature extraction and predicting probability output on the first BEV view data set and the second BEV view data set to obtain a first probability distribution set and a second probability distribution set includes: inputting the first BEV view data set into a BEV feature extraction network for feature extraction to obtain a first feature view data set; inputting the second BEV view data set into a BEV feature extraction network for feature extraction to obtain a second feature view data set; and classifying the first characteristic view data set and the second characteristic view data set through convolution layers of a convolution neural network to obtain a first probability distribution set and a second probability distribution set which correspond to each other.

According to a specific embodiment of the present invention, the BEV feature extraction network employs a Resnet18 neural network model.

According to an embodiment of the present invention, the step of optimizing the student model according to the first probability distribution set and the second probability distribution set, and using the optimized student model as the target semantic segmentation model includes: calculating a difference value of the first probability distribution set and the second probability distribution set by a loss function; and calculating the reverse gradient value of the difference value, and optimizing the network weight of the trained student model to obtain the target semantic segmentation model.

According to a specific embodiment of the present invention, the step of calculating the inverse gradient value of the difference value and optimizing the network weight of the trained student model to obtain the target semantic segmentation model includes: weighting the difference value according to a preset mask label; and calculating the weighted inverse gradient value of the difference value.

A semantic segmentation method based on any one of the BEV semantic segmentation models comprises the following steps: acquiring image data to be segmented; inputting the image data to be segmented into a semantic segmentation model to obtain intermediate data; and carrying out threshold value filtering on the intermediate data to obtain a corresponding semantic segmentation result.

A knowledge distillation-based training system for a BEV semantic segmentation model, comprising: the information acquisition module acquires an image data set; the first information processing module is used for inputting the image data set into a student model for feature extraction to obtain a first plane view data set; the image data set is further used for inputting the image data set into a teacher model for feature extraction to obtain a second plane view data set; the second information processing module is used for converting the first plane view data set and the second plane view data set according to the space coordinates of the image data in the image data set to obtain a first BEV view data set and a second BEV view data set; the third information processing module is used for carrying out feature extraction on the first BEV view data set and the second BEV view data set and predicting probability output to obtain a first probability distribution set and a second probability distribution set; and the model optimization module is used for optimizing the student model according to the first probability distribution set and the second probability distribution set and taking the optimized student model as a target semantic segmentation model.

An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of any of the above methods when executing the computer program; or the processor implements the semantic segmentation method when executing the computer program.

A computer-readable medium, on which instructions are stored, which are loaded by a processor and carry out the method of any of the above, or which computer program, when executed by a processor, implements the semantic segmentation method as claimed in claim 7.

The invention has the technical effect that the knowledge distillation-based BEV semantic segmentation model training method makes up the vacancy of knowledge distillation research in the multi-camera fusion semantic segmentation field in the prior art. Meanwhile, the semantic segmentation model optimized based on the knowledge distillation algorithm achieves better performance and precision, and the network model is lighter.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application. It is obvious that the drawings in the following description are only some embodiments of the application, and that for a person skilled in the art, other drawings can be derived from them without inventive effort. In the drawings:

FIG. 1 is a schematic flow chart diagram illustrating an embodiment of a knowledge-based distillation based training method for a BEV semantic segmentation model according to the present invention;

FIG. 2 is a schematic flow chart diagram illustrating an embodiment of a BEV semantic segmentation method based on the semantic segmentation model according to the present invention;

FIG. 3 is a schematic flow chart diagram of an embodiment of a knowledge-based distillation based training system for a BEV semantic segmentation model provided by the present invention;

fig. 4 is a schematic structural diagram of an embodiment of an electronic device provided in the present invention.

Detailed Description

Other advantages and effects of the present invention will become apparent to those skilled in the art from the disclosure herein, wherein the embodiments of the present invention are described in detail with reference to the accompanying drawings and preferred embodiments. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It should be understood that the preferred embodiments are only for illustrating the present invention, and are not intended to limit the scope of the present invention.

It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention, and the drawings only show the components related to the present invention rather than being drawn according to the number, shape and size of the components in actual implementation, and the type, amount and proportion of each component in actual implementation can be changed freely, and the layout of the components can be more complicated.

In the following description, numerous details are set forth to provide a more thorough explanation of embodiments of the present invention, however, it will be apparent to one skilled in the art that embodiments of the present invention may be practiced without these specific details, and in other embodiments, well-known structures and devices are shown in block diagram form, rather than in detail, to avoid obscuring embodiments of the present invention.

First, it should be noted that, in order to enable those skilled in the art to better understand the present application, the technical solutions in the embodiments of the present application are clearly and completely described.

Knowledge Distillation (KD) is a common method for model compression, and is different from pruning and quantization in model compression, and the Knowledge Distillation is to supervise and train a small model by constructing a light small model and utilizing output information of a large model with better performance so that the small model achieves better performance and precision. The large Model is generally called a Teacher Model (Teacher Model), the small Model is generally called a Student Model (Student Model), the supervised information from the output of the Teacher Model is called knowledge, and the process of Student learning to migrate the supervised information from the Teacher Model is called distillation. In brief, a large and bloated but rich and efficient teacher network accurately passes knowledge in a specific field to a student network through conversion, so that the student network can be well done in a certain aspect, is not bloated and is similar to model compression. For example, a horse is identified, the horse may be identified as a donkey or a car, training is performed through a label of hard targets, and then the image in-out model is identified, so that a soft target can be obtained, the probability that the horse can be seen from the soft target is relatively high, the probability that the horse can be identified as a donkey and the probability that the horse can be identified as a car are relatively low, the correlation between the horse and the donkey can be seen to be relatively low, the correlation between the horse and the car is relatively low, and the similarity between the horse and the car is also low. Therefore, soft targets can transfer more information, and can be used for training the student network. The soft target contains more "knowledge" and "information" like who, unlike who, how much like, and especially the relative size of the incorrect category probability (donkeys and cars).

Therefore, the teacher network needs to train the student network by taking the Soft Target output by the Hard Target training as the input of the student network. Meanwhile, the output of the Soft Target is not enough to be 'Soft', so that a distillation temperature T is added in the processing of the Soft Target, and the degree of Soft of an output label is corrected by using the new distillation temperature T in the softmax function.

Firstly, the teacher network and the student network both need to be subjected to softmax added with distillation temperature T, and both carry out an evaluation of loss, which is called degradation loss, and the process is a prediction result of the student network in simulating the teacher network. The student network also performs a calculation using softmax without adding distillation temperature T and then performs a loss calculation with hard label, where the loss is called student loss, which is partly the result of the student network in simulating real results. The final loss function is the weighted sum of the degradation loss and the student loss.

Knowledge distillation has the covering effect that for example, when a student network is trained, a dog is not used, but when a teacher network is trained, the class is classified, and when the knowledge transfer is completed, the class can be identified by finding the student network, because the teacher network transfers characteristic knowledge learned by the teacher network to the student network in the distillation process, and the student network also learns new knowledge. (similar to the teacher giving a lecture to a student, although the student does not see a real airplane, the teacher has seen, describing many details about the airplane to the student, who can also recognize the airplane when they see it) (zero sample learning).

Although the effect of the student network and the teacher network is quite similar, the student network is lighter. Less data sets need to be used to train the network, effectively preventing overfitting. If a model is trained by 100% in the conventional network, if the model is retrained by 3% in the conventional network, the fact that the precision is high and the test precision is low in the training process can be found, and overfitting does not occur, but if 3% of data is put in the student network.

Currently, in the technical field of semantic segmentation, semantic segmentation is performed through image semantic information and point cloud information, semantic segmentation is performed through 3D point cloud, and semantic segmentation is performed based on an attention mechanism. The knowledge distillation method is based on a knowledge distillation algorithm, a teacher model is used for optimizing a student model, and the optimized student model is used as a semantic segmentation model. The semantic segmentation model optimized by the knowledge distillation algorithm is more accurate, and the label of the semantic segmentation is closer to the image.

It should be noted that, the semantic segmentation method based on the semantic model provided in the embodiment of the present application is generally executed by a vehicle-mounted terminal, and performs semantic segmentation on image data acquired by a vehicle-mounted camera.

Example 1

Referring to fig. 1, a knowledge distillation-based training method of a BEV semantic segmentation model includes:

in step S110, image data is acquired.

Specifically, an image data set acquired by a vehicle-mounted multi-camera serves as a training data set.

And step S120, inputting the image data set into a student model for feature extraction to obtain a first plane view data set.

And step S130, inputting the image data set into a teacher model for feature extraction to obtain a second plane view data set.

The method comprises the steps of extracting characteristic data from an image data set for training so as to filter out interference information in an image. Further, in the embodiment of the present application, the student model adopts a Resnet-18 neural network model.

A Deep residual network (ResNet) is developed and optimized on the basis of Alexnet, and a great advantage of a residual neural network is identity mapping. The problem with AlexNet is the consequence of the optimization step back as the number of layers increases. ResNet is a phenomenon that mitigates gradient disappearance/explosion by introducing a residual. Meanwhile, the network layer number of ResNet reaches 152 layers, and the residual error neural network has the greatest advantage over the traditional neural network such as VGG (VGG) in that the residual error is calculated to solve the degradation problem caused by the excessively high layer number.

In the field of lightweight neural networks of semantic segmentation tasks, a nonlinear activation layer can introduce nonlinearity into a model, so that the model has stronger fitting capability. But at the same time, the collapse of low-dimensional data is caused, that is, when the feature of low dimension passes through the ReLU layer in the MobileNet V2, the feature is just as collapsed, and a part of the feature is destroyed or lost. Although recoverable, it was not able to be reduced by one hundred percent. The ReLU is a filter, but the scope of the filter is not the frequency domain in signal processing, but the characteristic domain, i.e. dimensional compression. While low-dimensional data flows through the non-linear active layer and collapses (information loss) while high-dimensional data does not. Among them, features with low dimension have small probability of being distributed on the active band of the ReLU, so that after passing, information loss is serious, and even can be completely lost. And the feature with high dimension has high probability of being distributed on the active band of the ReLU, and although part of the information may be lost, the feature is harmless and elegant, and most of the information is still reserved. What is more, the information killed by the ReLU may be just some useless visitors (redundant information). And data with low dimensionality is really such a case: the probability of high redundancy of its information is inherently low, and if it is forced to be non-linearly activated (dimension compressed), it is highly likely that useful information is lost, or even all information is lost (output is all 0).

Unlike nonlinear activation layers, linear activation layers do not compress the dimensions of the feature space. We can then get a principle of using the active layer: non-linear activation (e.g., reLU) is used for data with redundant information and linear activation (e.g., some linear transformation) is used for data without redundant information. The two types of activation are alternately flexible to allow for both non-linearity and information integrity. Since redundant information and non-redundant information carry as much useful information, a memory-intensive structure is preferably used for the non-redundant information when selecting a network. However, resNet is essentially a dry matter: the redundancy of information in the data is reduced. Specifically, linear activation is applied to non-redundant information (an identity part without redundancy is obtained through skip connection), and then non-linear activation is applied to redundant information (information extraction/filtering is performed on the rest parts except the identity through ReLU, and the extracted useful information is residual). The step of extracting the identity is the core of the ResNet.

ResNet-18, the number representing the depth of the network, where 18 designates 18 layers with weights, including convolutional and fully-connected layers, excluding pooling layers and BN layers.

Preferably, the ResNet neural network model with low precision is selected as the student model in the embodiment of the application.

In the implementation of the application, the teacher model adopts a ResNet-101 neural network model, and the student model needs to be optimized according to the teacher model, so that the target semantic segmentation model is obtained. Therefore, resNet-101 with higher accuracy is selected as the teacher model to perform feature extraction on the image data for training.

Step S140, converting the first plane view data set and the second plane view data set according to the spatial coordinates of the image data in the image data set, so as to obtain a first BEV view data set and a second BEV view data set.

Specifically, the space coordinates of the bird's-eye view in the image data are respectively projected onto the acquired plane view data according to the inside and outside parameters of the vehicle-mounted multi-camera, and the first BEV view data set and the second BEV view data set, namely the space view data, are obtained through a bilinear interpolation algorithm.

Among them, bilinear interpolation is also called bilinear interpolation. Mathematically, bilinear interpolation is linear interpolation extension of an interpolation function with two variables, and the core idea is to perform linear interpolation in two directions respectively. Bilinear interpolation is used as an interpolation algorithm in numerical analysis and is widely applied to the aspects of signal processing, digital image and video processing and the like.

Step S150, feature extraction is carried out on the first BEV view data set and the second BEV view data set, and probability output is predicated to obtain a first probability distribution set and a second probability distribution set.

The specific steps are as follows:

and inputting the first BEV view data set and the second BEV view data set into a BEV feature extraction network for feature extraction to obtain a first feature view data set and a second feature view data set. Since the image is converted from the plane data to the spatial data, the feature data thereof needs to be extracted again, so as to filter out the interference information in the image. Wherein, the BEV feature extraction network also adopts a ResNet-18 neural network model.

And inputting the obtained first characteristic view data set and the second characteristic view data set into a convolutional layer of a convolutional neural network for classification, thereby obtaining a first probability distribution set and a second probability distribution set corresponding to the image data.

And generating a soft label of the student model requirement by calculating the difference value of the first probability distribution set and the second probability distribution set loss function, namely the disiliation loss in the knowledge distillation algorithm, wherein the process is the prediction result of the student network in the simulation teacher network.

And S160, optimizing the student model according to the first probability distribution set and the second probability distribution set, and taking the optimized student model as a target semantic segmentation model.

The specific steps and processes are as follows:

and calculating the difference value of the first probability distribution set and the second probability distribution set corresponding to the student model and the teacher model by using the loss function formed by combining the mean square error loss function and the relative entropy loss function, namely the disiliation loss in the knowledge distillation algorithm. Meanwhile, a preset mask label is added according to task requirements to weight the difference, and the weighted difference L is calculated according to the following formula:

wherein, s (x) _i ) The probability distribution representing the output of the student model, i.e. the first set of probability distributions, t (x) _i ) The probability distribution output by the teacher model is the second set of probability distributions. w is a _k ，w _m Ratio of combining weights, M, representing a relative entropy loss function and a mean square error loss function, respectively _t Representing the annotated mask label as a weighted value of positive and negative samples. N is the number of images in the image dataset for training.

And finally, calculating the weighted difference value through a reverse gradient algorithm to obtain a soft label required by the student model, optimizing a network weight parameter of the student model according to the soft label, and taking the optimized student model as a target semantic segmentation model.

It should be noted that, the steps of the above methods are divided for clarity, and the implementation may be combined into one step or split some steps, and the steps are divided into multiple steps, and as long as the steps contain the same logical relationship, the steps are within the scope of the present patent; it is within the scope of this patent to add insignificant modifications or introduce insignificant designs to the algorithms or processes, but not to change the core designs of the algorithms and processes.

Example 2

Referring to fig. 2, an embodiment of the present application further provides a semantic segmentation method based on the BEV semantic segmentation model, including:

step S210, obtaining image data to be segmented

Step S220, inputting the image data to be segmented into a semantic segmentation model to obtain intermediate data

And step S230, threshold filtering is carried out on the intermediate data to obtain a corresponding semantic segmentation result.

Example 3

Referring to fig. 3, an embodiment of the present application further provides a knowledge-distillation-based training system for a BEV semantic segmentation model, including:

the information acquisition module 10 acquires an image data set;

the first information processing module 20 is configured to input the image data set into a student model for feature extraction, so as to obtain a first plane view data set; the image data set is further used for inputting the image data set into a teacher model for feature extraction, and a second plane view data set is obtained;

a second information processing module 30, configured to convert the first plane view data set and the second plane view data set according to the spatial coordinates of the image data in the image data set, so as to obtain a first BEV view data set and a second BEV view data set;

a third information processing module 40, configured to perform feature extraction and predict probability output on the first BEV view data set and the second BEV view data set, so as to obtain a first probability distribution set and a second probability distribution set;

and the model optimization module 50 is configured to optimize the student model according to the first probability distribution set and the second probability distribution set, and use the optimized student model as a target semantic segmentation model.

It should be noted that the knowledge-based distillation BEV semantic segmentation model training system provided in the foregoing embodiment and the knowledge-based distillation semantic segmentation model training method provided in the foregoing embodiment 1 belong to the same concept, and specific manners of operations performed by the modules and units have been described in detail in the method embodiment, and are not described again here. In practical applications, the semantic segmentation model training method based on knowledge distillation provided in embodiment 1 may be implemented by dividing the internal structure of the apparatus into different functional modules according to needs, so as to implement all or part of the functions described above, which is not limited herein.

Example 4

Referring to fig. 4, an embodiment of the present application further provides an electronic device, which includes a memory 2, a processor 1 and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the steps of any one of the methods described above are implemented.

Wherein the memory includes at least one type of readable storage medium including flash memory, removable hard disks, multimedia cards, card-type memory (e.g., SD or DX memory, etc.), magnetic memory, magnetic disks, optical disks, etc. The memory may in some embodiments be an internal storage unit of the electronic device, for example a removable hard disk of the electronic device. The memory may also be an external storage device of the electronic device in other embodiments, such as a plug-in removable hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like provided on the electronic device. Further, the memory may also include both an internal storage unit and an external storage device of the electronic device. The memory may be used not only to store application software installed in the electronic device and various types of data, but also to temporarily store data that has been output or will be output.

A processor may be composed of an integrated circuit in some embodiments, for example, a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same or different functions, including one or more Central Processing Units (CPUs), microprocessors, digital Processing chips, graphics processors, and combinations of various control chips. The processor is a Control Unit (Control Unit) of the electronic device, connects various components of the entire electronic device by using various interfaces and lines, and executes various functions and processes data of the electronic device by running or executing programs or modules stored in the memory and calling data stored in the memory.

The processor executes an operating system of the electronic device and various installed application programs. The processor executes the application program to implement the steps in the above-described method embodiments.

Illustratively, the computer program may be partitioned into one or more modules, stored in the memory and executed by the processor, to implement the invention. The one or more modules may be a series of computer program instruction segments capable of performing certain functions, the instruction segments being used to describe the execution of the computer program in the electronic device.

The integrated unit implemented in the form of a software functional module may be stored in a computer-readable storage medium. The software functional module is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a computer device, or a network device) or a processor (processor) to execute part of the functions of the method for detecting cold joint of a lithium battery according to various embodiments of the present invention.

In conclusion, the technical effect of the invention is that the knowledge distillation-based BEV semantic segmentation model training method makes up the vacancy of knowledge distillation research in the multi-shot fusion semantic segmentation field in the prior art. Meanwhile, the semantic segmentation model optimized based on the knowledge distillation algorithm achieves better performance and precision, and the network model is lighter.

The foregoing embodiments are merely illustrative of the principles of the present invention and its efficacy, and are not to be construed as limiting the invention. Those skilled in the art can modify or change the above-described embodiments without departing from the spirit and scope of the present invention. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical spirit of the present invention be covered by the claims of the present invention.

Claims

1. A knowledge distillation-based training method for a BEV semantic segmentation model is characterized by comprising the following steps:

acquiring an image dataset;

inputting the image data set into a student model for feature extraction to obtain a first plane view data set;

inputting the image data set into a teacher model for feature extraction to obtain a second plane view data set;

converting the first plane view data set and the second plane view data set according to the space coordinates of the image data in the image data set to obtain a first BEV view data set and a second BEV view data set;

performing feature extraction and prediction probability output on the first BEV view data set and the second BEV view data set to obtain a first probability distribution set and a second probability distribution set;

and optimizing the student model according to the first probability distribution set and the second probability distribution set, and taking the optimized student model as a target semantic segmentation model.

2. The method for training a BEV semantic segmentation model according to claim 1, wherein the student model adopts a Resnet18 neural network model; the teacher model employs a Resnet101 neural network model.

3. The method for training the BEV semantic segmentation model according to claim 1, wherein the step of performing feature extraction and predictive probability output on the first and second sets of BEV view data to obtain a first and second set of probability distributions comprises:

inputting the first BEV view data set into a BEV feature extraction network for feature extraction to obtain a first feature view data set;

inputting the second BEV view data set into a BEV feature extraction network for feature extraction to obtain a second feature view data set;

and classifying the first characteristic view data set and the second characteristic view data set through convolution layers of a convolution neural network to obtain a first probability distribution set and a second probability distribution set which correspond to each other.

4. The method of training a BEV semantic segmentation model according to claim 3, wherein the BEV feature extraction network employs a Resnet18 neural network model.

5. The method of training a BEV semantic segmentation model according to claim 1, wherein the step of optimizing a student model according to the first and second sets of probability distributions and using the optimized student model as a target semantic segmentation model comprises:

calculating a difference value of the first probability distribution set and the second probability distribution set through a loss function;

and calculating the reverse gradient value of the difference value, and optimizing the network weight of the trained student model to obtain the target semantic segmentation model.

6. The method for training the BEV semantic segmentation model according to claim 5, wherein the step of calculating the inverse gradient value of the difference value and optimizing the network weight of the trained student model to obtain the target semantic segmentation model comprises:

weighting the difference value according to a preset mask label;

and calculating the weighted reverse gradient value of the difference value.

7. A semantic segmentation method based on the BEV semantic segmentation model of any one of claims 1 to 6, comprising:

acquiring image data to be segmented;

inputting the image data to be segmented into a semantic segmentation model to obtain intermediate data;

and carrying out threshold filtering on the intermediate data to obtain a corresponding semantic segmentation result.

8. A knowledge distillation-based training system for a BEV semantic segmentation model is characterized by comprising:

the information acquisition module acquires an image data set;

the first information processing module is used for inputting the image data set into a student model for feature extraction to obtain a first plane view data set; the image data set is further used for inputting the image data set into a teacher model for feature extraction to obtain a second plane view data set;

the second information processing module is used for converting the first plane view data set and the second plane view data set according to the space coordinates of the image data in the image data set to obtain a first BEV view data set and a second BEV view data set;

the third information processing module is used for carrying out feature extraction and prediction probability output on the first BEV view data set and the second BEV view data set to obtain a first probability distribution set and a second probability distribution set;

and the model optimization module is used for optimizing the student model according to the first probability distribution set and the second probability distribution set and taking the optimized student model as a target semantic segmentation model.

9. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method of any one of claims 1 to 6 when executing the computer program or implementing the semantic segmentation method of claim 7 when executing the computer program.

10. A computer-readable medium, on which instructions are stored, which are loaded by a processor and carry out the method according to any one of claims 1 to 6, or which, when executed by a processor, carry out the semantic segmentation method according to claim 7.