CN113657409A

CN113657409A - Vehicle loss detection method, device, electronic device and storage medium

Info

Publication number: CN113657409A
Application number: CN202110937282.8A
Authority: CN
Inventors: 康甲; 刘莉红; 刘玉宇
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2021-08-16
Filing date: 2021-08-16
Publication date: 2021-11-16
Also published as: WO2023019875A1

Abstract

The invention discloses a vehicle loss detection method, device, electronic device and storage medium. The method includes: acquiring a target image; inputting the target image into a network model, and the backbone network of the network model includes a Swin Transformer network backbone network for predicting the damage position coordinates and damage category of the target image based on the Swin Transformer network; The damage category determines the damage detection results. The embodiment of the present invention uses the Swin Transformer network as the backbone network, which is more accurate than the CNN detection method, and can more effectively locate and identify the damaged part. Using Swin Transformer as the backbone network to extract features can explore the spatial information connection between each pixel of the image and the weighted selection of features, so as to achieve better feature extraction and utilization. At the same time, Swin Transformer has the characteristics of locality, translation invariance and residual learning of CNN, so it can surpass the CNN method in performance and solve the problems of complicated calculation and large memory consumption in other visual Transformer solutions.

Description

Vehicle loss detection method, device, electronic device and storage medium

Technical Field

The embodiment of the invention relates to a machine learning technology, in particular to a vehicle loss detection method and device, electronic equipment and a storage medium.

Background

As society rapidly develops, vehicles have become one of indispensable vehicles, and the increasing number of vehicles undoubtedly increases the incidence of traffic accidents. After a traffic accident happens, an insurance company usually carries out damage assessment to the accident site, namely, the vehicle damage is determined by observing a picture taken at the site, and the vehicle damage is used as a basis for claim settlement of the vehicle insurance company. The loss assessment link consumes a large amount of human resources, and the obtained result has strong subjectivity. Therefore, the vehicle damage detection system starts to gradually replace manual operation based on the deep learning method, and the vehicle damage type can be accurately detected through one or more pictures.

Existing target detectors are mainly based on CNN implementations. However, the CNN-based image analysis process has a problem of being not accurate enough.

Disclosure of Invention

The invention provides a vehicle loss detection method, a vehicle loss detection device, electronic equipment and a storage medium, and aims to improve the accuracy of vehicle damage detection.

In a first aspect, an embodiment of the present invention provides a vehicle loss detection method, including:

acquiring a target image;

inputting the target image into a network model, wherein a trunk network of the network model comprises a Swin transform network (also called a hierarchical visual transform network), and the trunk network is used for predicting the damage position coordinates and the damage types of the target image based on the Swin transform network;

and determining a damage detection result according to the damage position coordinate and the damage category.

In a second aspect, an embodiment of the present invention further provides a vehicle loss detection apparatus, including:

the image acquisition module is used for acquiring a target image;

the detection module is used for inputting the target image into a network model, and a trunk network of the network model comprises a Swin transducer network and is used for predicting the damage position coordinates and the damage types of the target image based on the Swin transducer network;

and the detection result determining module is used for determining a damage detection result according to the damage position coordinate and the damage category.

In a third aspect, an embodiment of the present invention further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the computer program to implement the vehicle loss detection method according to the embodiment of the present application.

In a fourth aspect, embodiments of the present invention further provide a storage medium containing computer-executable instructions, which when executed by a computer processor, are configured to perform a vehicle loss detection method as shown in the embodiments of the present application.

According to the vehicle loss detection method provided by the embodiment of the invention, a target image is obtained; inputting the target image into a network model, wherein a trunk network of the network model comprises a Swin transform network, and the trunk network is used for predicting the damage position coordinates and the damage types of the target image based on the Swin transform network; and determining a damage detection result according to the damage position coordinate and the damage category. Compared with the current method that the CNN is used for detecting the vehicle loss is not accurate enough, the method and the device provided by the embodiment of the invention use the Swin transducer network as the main network, are more accurate compared with the CNN detection mode, and can effectively position and identify the damaged part. By adopting Swin transform as a backbone network to extract features, spatial information relation among pixels of the image and weighting selection of the features can be explored, so that better feature extraction and utilization are realized. Meanwhile, the Swin Transformer has the characteristics of locality, translation invariance, residual learning and the like of the CNN, so that the problems of complex calculated amount and large memory consumption in other visual Transformer schemes can be solved while the performance exceeds that of the CNN method. The Swin transform block in the Swin transform has the advantages of wide range of vehicle type application and detection, suitability for field environment and complex photographing background, can realize high-efficiency damage assessment of damaged parts of vehicles, and optimizes damage assessment efficiency based on the method of the self-attention mechanism.

Drawings

FIG. 1 is a flow chart of a vehicle loss detection method according to a first embodiment of the present invention;

fig. 2 is a schematic structural diagram of a Swin Transformer network according to a first embodiment of the present invention;

FIG. 3 is a schematic structural diagram of a Swin Transformer block according to a first embodiment of the present invention;

fig. 4 is a flowchart of a vehicle loss detection method in the second embodiment of the invention;

fig. 5 is a schematic structural view of a vehicle loss detection apparatus in a third embodiment of the present invention;

fig. 6 is a schematic structural diagram of an electronic device in the fourth embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.

Example one

Fig. 1 is a flowchart of a vehicle loss detection method according to an embodiment of the present invention, where the present embodiment is applicable to a vehicle loss detection situation, the method may be executed by an electronic device, and the electronic device may be a computer device or a terminal, and specifically includes the following steps:

and step 110, acquiring a target image.

The target image is an image for vehicle loss detection. The user can take a picture of the damaged vehicle through the handheld terminal, and the picture obtained through taking the picture is used as a target image. The pre-captured image may also be imported to a computer device as a target image.

And 120, inputting the target image into a network model, wherein a trunk network of the network model comprises a Swin Transformer network trunk network, and the Swin Transformer network trunk network is used for predicting the damage position coordinates and the damage types of the target image based on the Swin Transformer network.

The structure of the Swin Transformer network is shown in FIG. 2, and includes a block partition (patch partition) and four stages. Each stage includes a linear embedding layer (linear embedding) and a Swin Transformer block (block). Each stage is used to perform a down-sampling.

Illustratively, an input target image 224 × 224 is divided into a non-overlapping set of patches by a patch partition (patch partition), wherein each patch has a size of 4 × 4, the target image has 3 color channels, each patch has a feature dimension of 4 × 3 ═ 48, and the number of patches is H/4x W/4.

In the stage1 part (stage1), firstly, a linear embedding layer (linear embedding) is used for changing the characteristic dimension of the partitioned patch into C, and then the C is sent into a Swin Transformer Block; the operations of stage2-stage4 are the same, firstly, through a patch clustering, input adjacent blocks of 2x2 are merged, the number of the obtained patch blocks becomes H/8x W/8, the characteristic dimension becomes 4C, and so on, the feature vector of the target image is processed through four stages, and the vehicle damage type and the damaged position information are obtained. In the Swin Transformer network, the size of each patch is preset, and the number of the patches is determined according to the determined size of the patches.

The segmentation layer is used for segmenting the image into a plurality of patches and obtaining a feature vector of each patch. And the stage1 to the stage4 are used for carrying out image recognition according to the characteristic vector to obtain the damage position coordinate and the damage category of the target image. Stage1 identifies a feature vector of a target image in each block in units of blocks. And the stage2 merges the blocks in the stage1 to obtain the number of the blocks H/8x W/8, and identifies the feature vector of the target image in each block according to the merged blocks. And by analogy, combining the blocks of the previous stage in the next stage, and identifying the feature vector of the target image according to the combined blocks patch. And 4, after the characteristic vector of the target image is obtained, mapping the characteristic vector to a neural network for image recognition.

Optionally, inputting the target image into the network model, including: convolving the image through the convolution layer to obtain convolution data; the convolved data are used as input to a Swin Transformer network.

Alternatively, a convolution layer is provided before the block division layer (patch partition), and the target image is convolved by the convolution layer. Illustratively, two layers of 3 by 3 convolutional layers are configured, and the target image is convolved using the two layers of 3 by 3 convolutional layers and converted into convolution data. The convolution data is input to the patch partition layer (patch partition).

The convolution layer is used for carrying out convolution on the image, so that not only can the complexity of subsequent calculation be reduced, but also the model precision can be improved. Using two layers of 3 by 3 convolutional layers can further improve the convolution efficiency.

After the convolution data is input to the patch partition layer (patch partition), the input convolution data is divided into non-overlapping patch sets by the patch partition layer (patch partition) as input features of the Swin Transformer network.

The Swin Transformer network as the backbone is formed by stacking Swin Transformer blocks in each stage. The input features are transformed in feature dimension by a linear embedding layer (linear embedding). The Swin Transformer network realizes the multiplexing of the characteristics by combining the input according to the adjacent latches.

As shown in fig. 3, each Swin Transformer block (Swin Transformer block) consists of a displacement window based MSA (multi-head self attribute) with two layers of MLPs (Muti-Layer persistence). A layernorm (ln) layer is used before each MSA module and each MLP and residual concatenation is used after each MSA and MLP. The MSA module divides an input picture into non-coincident windows, and then performs self-attention calculation in different windows, wherein the calculation complexity and the image size are in a linear relation.

Optionally, the Swin Transformer network includes a plurality of Swin Transformer blocks, and each Swin Transformer block includes a plurality of MSA layers;

the input of the MSA layer is provided with a first convolution layer; the output of the MSA layer is provided with a second convolutional layer.

For each MSA layer, a first convolutional layer is set at its input for dimensionality reduction. And setting a second convolution layer at the output of the transformer for dimension increasing. Illustratively, the first convolution layer may be a 1 x 1 convolution layer. The second convolution layer may be a 1 x 1 convolution layer. Correspondingly, the input of the MSA layer is provided with a 1 × 1 convolution layer; the output of the MSA layer is provided with 1 x 1 convolutional layers. By providing convolution layers for each input and output of the MSA layer, the characteristic operation efficiency can be improved, and the operation speed can be increased. For each MSA layer, 1 x 1 convolutional layer is set at its input for dimensionality reduction. At its output, 1 x 1 convolutional layers are set for dimensionality enhancement.

Optionally, the backbone network is connected to a neck network, and the neck network includes:

feature Pyramid Networks (FPN) and Balanced Feature Pyramid Networks (BFP).

The characteristic map pyramid network is used for extracting characteristics of the image of each scale, multi-scale characteristic representation can be generated, and characteristic maps of all levels have strong semantic information and even comprise some characteristic maps with high resolution.

And (3) performing convolution on the images in the stages 1 to 4 according to the sizes, namely, from the bottom layer to the top layer of the feature pyramid network, performing feature extraction on the image of each layer by the feature pyramid network to generate multi-scale feature representation, and fusing the features. The images of each layer have certain semantic information. Feature fusion may be performed through a feature map pyramid network. The balanced feature pyramid network is used for enhancing the semantic features of the multilayer feature layer balanced through deep integration. Features are enhanced by a balanced feature pyramid network.

The neck network is used for connecting the backbone network backbone and the head network head, so that the characteristics output by the backbone network can be more efficiently applied to the head network, and the data processing efficiency is improved.

And step 130, determining a damage detection result according to the damage position coordinate and the damage type.

After the Swin Transformer network outputs the damage position coordinates and the damage types through forward propagation in step 120, a final damage detection result can be screened out through a soft-NMS (non-maximum suppression) algorithm.

According to the vehicle loss detection method provided by the embodiment of the invention, a target image is obtained; inputting a target image into a network model, wherein a trunk network of the network model comprises a Swin transform network trunk network and is used for predicting damage position coordinates and damage types of the target image based on the Swin transform network; and determining a damage detection result according to the damage position coordinate and the damage category. Compared with the current method that the CNN is used for detecting the vehicle loss is not accurate enough, the method and the device provided by the embodiment of the invention use the Swin transducer network as the main network, are more accurate compared with the CNN detection mode, and can effectively position and identify the damaged part. By adopting Swin transform as a backbone network to extract features, spatial information relation among pixels of the image and weighting selection of the features can be explored, so that better feature extraction and utilization are realized. Meanwhile, the Swin Transformer has the characteristics of locality, translation invariance, residual learning and the like of the CNN, so that the problems of complex calculated amount and large memory consumption in other visual Transformer schemes can be solved while the performance exceeds that of the CNN method. The Swin transform block in the Swin transform has the advantages of wide range of vehicle type application and detection, suitability for field environment and complex photographing background, can realize high-efficiency damage assessment of damaged parts of vehicles, and optimizes damage assessment efficiency based on the method of the self-attention mechanism.

Example two

Fig. 4 is a flowchart of a vehicle loss detection method according to a second embodiment of the present invention, which further illustrates the above embodiment, and before the target image is acquired in step 110, the method further includes a step of training a Swin Transformer network. The first embodiment provides an implementation method for detecting traffic loss by using a Swin Transformer network as a backbone network. The embodiment is used for providing the training mode of the network. The method can be implemented by the following steps:

and step 210, marking the vehicle loss historical picture according to a marking criterion, and configuring the damage type of the vehicle loss historical picture.

Wherein, the damage category and the marking criterion can be determined by the settlement personnel and the algorithm engineer after meeting. The damage categories include vehicle damage of varying severity that requires reimbursement. The marking criteria include special case marking criteria such as overlapping of various injuries, uncertain whether the injuries are injuries, uncertain why the injuries are, and the like. The categories of injury include: scratches, dents, wrinkles, dead folds, tears, deletions, and the like.

And marking historical pictures of the vehicle body damage in batches based on the damage category. Optionally, manual labeling may be performed. And marking the damage form appearing in each picture by adopting a rectangular frame, and recording the damage type of the damage form. Further, pictures which are difficult to distinguish damage types are removed, and a vehicle body damage database is constructed.

And step 220, training the Swin Transformer network according to the marked vehicle loss historical picture.

Optionally, a part of the image is used as a training set and another part of the image is used as a testing set from the body damage database.

And (3) randomly cutting all pictures in the training set, randomly rotating, randomly changing data enhancement operations such as saturation, hue and contrast and the like, then scaling the pictures to 896 × 896 pixels, and inputting the pixels into a Swin transform for training. The training process comprises the step of taking parameters such as the car damage image and the mark of the damage type as input to train the Swin transform network. And testing on the test set every 1 period (epoch), and respectively storing the model parameters with the highest detection model map. And optimizing the Swin Transformer network through multiple iterations.

Optionally, training the Swin Transformer network according to the labeled vehicle loss historical image includes:

and in the training process, performing regression calculation of the Swin transducer network according to the distance punishment damage function.

The IOU is also called an Intersection over Union, and represents a ratio of an Intersection and a Union of a "predicted bounding box" and a "real bounding box". The network is usually trained by using an IOU calculation formula and a bounding box positioning loss function. However, the accuracy obtained using the above calculation method is low. Therefore, the regression calculation of the Swin Transformer network is performed according to the distance punishment damage function, and therefore the positioning accuracy of the predicted ore is improved. The dioLOss penalty function may still provide a direction of movement for the bounding box when it does not overlap the target box. In addition, DIoU loss has a faster convergence rate relative to IOU loss. Meanwhile, for the case where two frames are included in the horizontal direction and the vertical direction, the DIoU loss can realize a fast regression.

Illustratively, the distance penalty damage function (DIoU Loss) is used to perform bounding box regression calculations for Swin Transformer networks. Distance punished damage L_DIoUCan be calculated by the following formula:

wherein b and b^gtRespectively representing the center points, p, of the prediction and real boxes²(b，b^gt) The expression calculates the euclidean distance between the two center points. C represents the diagonal distance of the minimum closure area that can contain both the prediction box and the real box. IoU denotes the intersection ratio of the prediction box and the real box.

in the training process, data enhancement is carried out according to the vehicle loss historical picture; and training the Swin Transformer network by using the data-enhanced car loss historical picture.

In the training process, different data enhancement methods can be adopted according to the vehicle loss historical pictures, and the methods comprise the steps of trying different types of optimizers, adopting a learning rate reduction strategy, adopting a regularization technology and the like. In addition, a multi-scale training mode is adopted to train for enough time epochs to make the loss values of the model in the training set and the test set converge, and the model parameters with the highest map of the network in the test set are stored. Where a complete data set passes through the neural network once and back once, this process is called a time epoch.

In addition, a small number of targeted data enhancements, including mosaic and dim light, can be misdetected, thus randomly adding mosaic and image saturation changes to the data enhancements.

Step 230, a target image is acquired.

Step 240, inputting the target image into a network model, wherein a trunk network of the network model comprises a Swin Transformer network trunk network, and the Swin Transformer network trunk network is used for predicting the damage position coordinates and the damage category of the target image based on the Swin Transformer network.

And step 250, determining a damage detection result according to the damage position coordinate and the damage category.

The vehicle loss detection method provided by the embodiment of the application can train the network more efficiently, so that the trained network is more accurate.

EXAMPLE III

Fig. 5 is a schematic structural diagram of a vehicle loss detection apparatus according to a third embodiment of the present invention, where the present embodiment is applicable to a vehicle loss detection situation, the method may be executed by an electronic device, and the electronic device may be a computer device or a terminal, and specifically includes: an image acquisition module 310, a detection module 320, and a detection result determination module 330.

An image acquisition module 310 for acquiring a target image;

the detection module 320 is configured to input the target image into a network model, where a backbone network of the network model includes a Swin Transformer network, and the backbone network is used for predicting the damage position coordinates and the damage category of the target image based on the Swin Transformer network;

and a detection result determining module 330, configured to determine a damage detection result according to the damage position coordinate and the damage category.

On the basis of the foregoing embodiment, the detection module 320 is configured to:

convolving the image through the convolution layer to obtain convolution data;

and taking the convolution data as an input of a Swin Transformer network.

On the basis of the above embodiment, the Swin Transformer network includes a plurality of Swin Transformer blocks, and each Swin Transformer block includes a plurality of MSA layers;

the input of the MSA layer is provided with a first convolution layer;

the output of the MSA layer is provided with a second convolutional layer.

Specifically, the input of the MSA layer is provided with a 1 × 1 convolution layer, and the output of the MSA layer is provided with a 1 × 1 convolution layer.

On the basis of the above embodiment, the backbone network is connected to a neck network, and the neck network includes:

a feature map pyramid network and a balanced feature pyramid network.

On the basis of the above embodiment, the training device further comprises a training module. The training module is used for:

marking the vehicle loss historical picture according to a marking criterion, and configuring the damage category of the vehicle loss historical picture;

and training the Swin transform network according to the marked vehicle loss historical picture.

On the basis of the above embodiment, the training module is configured to:

in the training process, data enhancement is carried out according to the vehicle loss historical picture;

and training the Swin Transformer network by using the data-enhanced car loss historical picture.

In the vehicle loss detection apparatus provided in the embodiment of the present invention, the image obtaining module 310 obtains a target image; the detection module 320 inputs the target image into a network model, wherein a trunk network of the network model comprises a Swin Transformer network, and the trunk network is used for predicting the damage position coordinates and the damage types of the target image based on the Swin Transformer network; a detection result determination module 330. And determining a damage detection result according to the damage position coordinate and the damage category. Compared with the current method that the CNN is used for detecting the vehicle loss is not accurate enough, the method and the device provided by the embodiment of the invention use the Swin transducer network as the main network, are more accurate compared with the CNN detection mode, and can effectively position and identify the damaged part. By adopting Swin transform as a backbone network to extract features, spatial information relation among pixels of the image and weighting selection of the features can be explored, so that better feature extraction and utilization are realized. Meanwhile, the Swin Transformer has the characteristics of locality, translation invariance, residual learning and the like of the CNN, so that the problems of complex calculated amount and large memory consumption in other visual Transformer schemes can be solved while the performance exceeds that of the CNN method. The Swin transform block in the Swin transform has the advantages of wide range of vehicle type application and detection, suitability for field environment and complex photographing background, can realize high-efficiency damage assessment of damaged parts of vehicles, and optimizes damage assessment efficiency based on the method of the self-attention mechanism.

The vehicle loss detection device provided by the embodiment of the invention can execute the vehicle loss detection method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.

Example four

Fig. 6 is a schematic structural diagram of an electronic apparatus according to a fourth embodiment of the present invention, as shown in fig. 6, the electronic apparatus includes a processor 40, a memory 41, an input device 42, and an output device 43; the number of the processors 40 in the electronic device may be one or more, and one processor 40 is taken as an example in fig. 6; the processor 40, the memory 41, the input device 42 and the output device 43 in the electronic apparatus may be connected by a bus or other means, and the bus connection is exemplified in fig. 6.

The memory 41, as a computer-readable storage medium, may be used to store software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to the vehicle loss detection method in the embodiment of the present invention (for example, the image acquisition module 310, the detection module 320, the detection result determination module 330, and the training module in the vehicle loss detection apparatus). The processor 40 executes various functional applications of the electronic device and data processing by executing software programs, instructions, and modules stored in the memory 41, that is, implements the vehicle loss detection method described above.

The memory 41 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal, and the like. Further, the memory 41 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, memory 41 may further include memory located remotely from processor 40, which may be connected to the electronic device through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The input device 42 is operable to receive input numeric or character information and to generate key signal inputs relating to user settings and function controls of the electronic apparatus. The output device 43 may include a display device such as a display screen.

EXAMPLE five

Fifth, an embodiment of the present invention further provides a storage medium containing computer-executable instructions, which when executed by a computer processor, perform a vehicle loss detection method, the method comprising:

acquiring a target image;

inputting the target image into a network model, wherein a trunk network of the network model comprises a Swin transform network, and the trunk network is used for predicting the damage position coordinates and the damage types of the target image based on the Swin transform network;

On the basis of the above embodiment, the inputting the target image into the network model includes:

convolving the image through the convolution layer to obtain convolution data;

and taking the convolution data as an input of a Swin Transformer network.

the input of the MSA layer is provided with a first convolution layer; (the input of the MSA layer is provided with 1 x 1 convolution layer)

The output of the MSA layer is provided with a second convolutional layer.

a feature map pyramid network and a balanced feature pyramid network.

On the basis of the above embodiment, before acquiring the target image, the method further includes:

On the basis of the above embodiment, the training of the Swin Transformer network according to the labeled car loss history picture includes:

Of course, the storage medium provided by the embodiment of the present invention contains computer-executable instructions, and the computer-executable instructions are not limited to the operations of the method described above, and may also execute the relevant operations in the vehicle loss detection method provided by any embodiment of the present invention.

From the above description of the embodiments, it is obvious for those skilled in the art that the present invention can be implemented by software and necessary general hardware, and certainly, can also be implemented by hardware, but the former is a better embodiment in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which can be stored in a computer-readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, and includes instructions for enabling an electronic device (which may be a personal computer, a server, or a network device) to execute the methods according to the embodiments of the present invention.

It should be noted that, in the embodiment of the vehicle loss detection apparatus, the included units and modules are only divided according to the functional logic, but are not limited to the above division as long as the corresponding functions can be realized; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.

It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims

1. a vehicle loss detection method, is characterized in that, comprises:

Get the target image;

Inputting the target image into a network model, the backbone network of the network model includes a Swin Transformer network, and the backbone network is used to predict the damage location coordinates and damage category of the target image based on the Swin Transformer network;

The damage detection result is determined according to the damage location coordinates and the damage type.

2. The method according to claim 1, wherein the inputting the target image into a network model comprises:

Convolve the image through the convolution layer to obtain convolution data;

The convolutional data is used as input to the Swin Transformer network.

3. The method according to claim 1, wherein the Swin Transformer network comprises a plurality of SwinTransformer blocks, and the Swin Transformer blocks comprise a plurality of MSA layers;

The input of the MSA layer is provided with a first convolution layer;

The output of the MSA layer is provided with a second convolutional layer.

4. The method of claim 1, wherein the backbone network is connected to a neck network, the neck network comprising:

Feature Map Pyramid Network and Balanced Feature Pyramid Network.

5. The method according to claim 1, characterized in that, before acquiring the target image, further comprising:

Mark the historical pictures of vehicle damage according to the marking criteria, and configure the damage category of the historical pictures of vehicle damage;

The Swin Transformer network is trained according to the annotated historical pictures of vehicle damage.

6. The method according to claim 5, wherein the SwinTransformer network is trained according to the marked vehicle damage history picture, comprising:

During the training process, the regression calculation of the Swin Transformer network is performed according to the distance penalty damage function.

7. The method according to claim 5, wherein the SwinTransformer network is trained according to the marked vehicle damage history picture, comprising:

During the training process, data enhancement is performed according to the historical pictures of vehicle damage;

The Swin Transformer network is trained using data-augmented pictures of vehicle damage history.

8. A vehicle loss detection device, comprising:

The image acquisition module is used to acquire the target image;

a detection module for inputting the target image into a network model, the backbone network of the network model includes a SwinTransformer network, and the backbone network is used to predict the damage location coordinates and damage category of the target image based on the Swin Transformer network;

The detection result determination module is used for determining the damage detection result according to the damage location coordinates and the damage category.

9. An electronic device, comprising a memory, a processor and a computer program stored on the memory and running on the processor, wherein the processor implements any one of claims 1-7 when the processor executes the program. 1. The described vehicle loss detection method.

10. A storage medium containing computer-executable instructions, when executed by a computer processor, for performing the vehicle loss detection method of any of claims 1-7.