WO2023019875A1 - 车辆损失检测方法、装置、电子设备及存储介质 - Google Patents

车辆损失检测方法、装置、电子设备及存储介质 Download PDF

Info

Publication number
WO2023019875A1
WO2023019875A1 PCT/CN2022/070984 CN2022070984W WO2023019875A1 WO 2023019875 A1 WO2023019875 A1 WO 2023019875A1 CN 2022070984 W CN2022070984 W CN 2022070984W WO 2023019875 A1 WO2023019875 A1 WO 2023019875A1
Authority
WO
WIPO (PCT)
Prior art keywords
network
swin
damage
transformer
target image
Prior art date
Application number
PCT/CN2022/070984
Other languages
English (en)
French (fr)
Inventor
康甲
刘莉红
刘玉宇
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2023019875A1 publication Critical patent/WO2023019875A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • Embodiments of the present invention relate to machine learning technology in the field of artificial intelligence, and in particular to a vehicle loss detection method, device, electronic equipment, and storage medium.
  • the insurance company After a traffic accident, the insurance company usually goes to the accident scene to determine the damage, that is, to determine the vehicle damage by observing the photos taken at the scene, and use it as the basis for the auto insurance company's claims. Because the link of loss determination consumes a lot of human resources, and the results obtained are highly subjective. Therefore, the vehicle damage detection system based on the deep learning method gradually replaces the manual operation, which can accurately detect the type of vehicle damage through one or more pictures.
  • the invention provides a vehicle loss detection method, device, electronic equipment and storage medium, so as to improve the accuracy of vehicle damage detection.
  • an embodiment of the present invention provides a vehicle loss detection method, including:
  • the target image is input to the network model, and the backbone network of the network model includes a Swin Transformer network (also known as a hierarchical visual transformer network), and the backbone network is used to predict the damage position coordinates and the damage position of the target image based on the Swin Transformer network.
  • Swin Transformer network also known as a hierarchical visual transformer network
  • a damage detection result is determined according to the damage position coordinates and the damage category.
  • the embodiment of the present invention also provides a vehicle loss detection device, including:
  • An image acquisition module configured to acquire a target image
  • the detection module is used to input the target image to the network model, and the backbone network of the network model includes the Swin Transformer network.
  • the backbone network is used to predict the damage position coordinates and the damage category of the target image based on the Swin Transformer network;
  • the detection result determination module is configured to determine the damage detection result according to the damage position coordinates and damage category.
  • the embodiment of the present invention also provides an electronic device, including a memory, a processor, and a computer program stored in the memory and operable on the processor, wherein when the computer program makes the processor Do the following:
  • the target image is input to the network model, the backbone network of the network model includes a Swin Transformer network, and the backbone network is used to predict damage position coordinates and damage categories of the target image based on the Swin Transformer network;
  • a damage detection result is determined according to the damage position coordinates and the damage category.
  • the embodiment of the present invention also provides a storage medium containing computer-executable instructions, and the computer-executable instructions are used to perform the following steps when executed by a computer processor:
  • the target image is input to the network model, the backbone network of the network model includes a Swin Transformer network, and the backbone network is used to predict damage position coordinates and damage categories of the target image based on the Swin Transformer network;
  • a damage detection result is determined according to the damage position coordinates and the damage category.
  • the vehicle loss detection method provided by the embodiment of the present invention obtains a target image; the target image is input to a network model, the backbone network of the network model includes a Swin Transformer network, and the backbone network is used to predict the target image based on the Swin Transformer network
  • the coordinates of the damage location and the category of the damage; the damage detection result is determined according to the coordinates of the location of the damage and the category of the damage.
  • the embodiment of the present invention uses the Swin Transformer network as the backbone network, which is more accurate than the CNN detection method, and can more effectively locate and identify the damaged part.
  • Swin Transformer as the backbone network to extract features can explore the spatial information connection between each pixel of the image and the weighted selection of features, so as to achieve better feature extraction and utilization.
  • Swin Transformer has the characteristics of CNN's locality, translation invariance and residual learning, so it can surpass the CNN method in performance and solve the problems of complicated calculation and large memory consumption in other visual Transformer solutions.
  • the method based on the self-attention mechanism of the Swin Transformer block in Swin Transformer has the advantages of a wide range of application detection models, suitable for on-site environments and complex photographing backgrounds, and can realize efficient damage determination of vehicle damage parts and optimize the efficiency of damage determination.
  • Fig. 1 is a flowchart of a vehicle loss detection method in Embodiment 1 of the present invention
  • Fig. 2 is the structural representation of the Swin Transformer network in the embodiment of the present invention one;
  • Fig. 3 is the structural representation of the Swin Transformer block in embodiment one of the present invention.
  • Fig. 4 is a flowchart of a vehicle loss detection method in Embodiment 2 of the present invention.
  • FIG. 5 is a schematic structural diagram of a vehicle loss detection device in Embodiment 3 of the present invention.
  • FIG. 6 is a schematic structural diagram of an electronic device in Embodiment 4 of the present invention.
  • FIG. 1 is a flow chart of the vehicle loss detection method provided by Embodiment 1 of the present invention. This embodiment is applicable to the situation of vehicle loss detection.
  • the method can be executed by electronic equipment, and the electronic equipment can be a computer device or a terminal, specifically including follow the steps below:
  • Step 110 acquiring a target image.
  • the target image is the image for vehicle loss detection.
  • the user can take pictures of the damaged vehicle through the handheld terminal, and use the pictures taken as the target image. It is also possible to import a pre-captured image to a computer device as a target image.
  • Step 120 input the target image into the network model, the backbone network of the network model includes the Swin Transformer network backbone network for predicting the damage position coordinates and damage category of the target image based on the Swin Transformer network.
  • Each stage stage includes a linear embedding layer (linear embedding) and a Swin Transformer block (block). Each stage is used to perform one downsampling.
  • Stage 1 part (stage1), first through a linear embedding layer (linear embedding) to change the divided patch feature dimension into C, and then send it to the Swin Transformer Block; stage2-stage4 operations are the same, first through a patch merging, input according to The adjacent blocks of 2x2 patches are merged, and the number of patch blocks obtained becomes H/8x W/8, and the feature dimension becomes 4C.
  • the feature vector of the target image is processed through four stages to obtain Damage category and damage location information.
  • the size of each block patch is pre-configured, and the number of block patches is determined according to the determined patch size.
  • the segmentation layer is used to divide the image into multiple patches and get the feature vector of each patch.
  • Stages 1 to 4 are used for image recognition based on feature vectors to obtain the damage location coordinates and damage categories of the target image.
  • Phase 1 takes a block as a unit and identifies the feature vector of the target image in each block.
  • Stage 2 merges the fast patches in stage 1, and the number of fast patches obtained is H/8x W/8. According to the merged blocks, the feature vector of the target image in each block is identified.
  • the latter stage merges the blocks of the previous stage, and identifies the feature vector of the target image according to the merged block patch.
  • stage 4 after the feature vector of the target image is obtained, the feature vector is mapped to the neural network for image recognition.
  • inputting the target image into the network model includes: convoluting the image through a convolutional layer to obtain convolutional data; using the convolutional data as an input to the Swin Transformer network.
  • a convolution layer is set before the patch partition layer, and the convolution operation is performed on the target image through the convolution layer.
  • the convolution operation is performed on the target image through the convolution layer.
  • two 3 by 3 convolutional layers are configured, and the target image is convoluted by using the two 3 by 3 convolutional layers, and the target image is converted into convolutional data. This convolutional data is input to a patch partition layer.
  • Using the convolutional layer to convolve the image can not only reduce the subsequent computational complexity, but also improve the accuracy of the model.
  • Using two 3 by 3 convolutional layers can further improve the convolution efficiency.
  • the input convolutional data is divided into non-overlapping block patch sets by the patch partition layer as the input features of the Swin Transformer network.
  • the Swin Transformer network as the backbone backbone is formed by stacking the Swin Transformer blocks in each stage.
  • the input features are transformed by the feature dimension through the linear embedding layer.
  • the Swin Transformer network realizes the multiplexing of features by merging the input according to adjacent patches.
  • each Swin Transformer block (Swin Transformer block) consists of a displacement window-based MSA (multi-head self attention) with two layers of MLP (Muti-Layer Perception).
  • a LayerNorm (LN) layer is used before each MSA module and each MLP, and a residual connection is used after each MSA and MLP.
  • the MSA module divides the input image into non-overlapping windows, and then performs self-attention calculations in different windows, and its computational complexity has a linear relationship with the image size.
  • the Swin Transformer network includes multiple Swin Transformer blocks, and the Swin Transformer block includes multiple MSA layers;
  • the input of the MSA layer is provided with a first convolutional layer; the output of the MSA layer is provided with a second convolutional layer.
  • a first convolutional layer is set at its input for dimensionality reduction.
  • a second convolutional layer is set at its output for dimension upscaling.
  • the first convolutional layer may be a 1*1 convolutional layer.
  • the second convolutional layer may be a 1*1 convolutional layer.
  • the input of the MSA layer is provided with a 1*1 convolutional layer; the output of the MSA layer is provided with a 1*1 convolutional layer.
  • the backbone network is connected to the neck network, and the neck network includes:
  • Feature Pyramid Networks Feature Pyramid Networks, FPN
  • Balanced Feature Pyramid Networks Balanced Feature Pyramid, BFP
  • the feature map pyramid network is used for feature extraction of images of each scale, which can generate multi-scale feature representations, and feature maps of all levels have strong semantic information, even including some high-resolution feature maps.
  • the images from stage 1 to stage 4 are convoluted by size, which is equivalent to the bottom layer to the top layer of the feature pyramid network.
  • the feature pyramid network extracts features from the image of each layer, generates multi-scale feature representation, and fuses the features.
  • the images of each layer have certain semantic information.
  • Feature fusion can be performed through a feature map pyramid network.
  • a balanced feature pyramid network is used to enhance multi-layer feature layers to balance semantic features through deep integration.
  • Features are augmented by a balanced feature pyramid network.
  • the neck network is used to connect the backbone network backbone and the head network head, so that the features output by the backbone network can be more efficiently applied to the head network and improve data processing efficiency.
  • Step 130 determining the damage detection result according to the coordinates of the damage location and the damage category.
  • Step 120 After the Swin Transformer network outputs the damage location coordinates and damage category through forward propagation, the final damage detection result can be filtered out through the soft-NMS (non-maximum value suppression) algorithm.
  • soft-NMS non-maximum value suppression
  • the vehicle loss detection method provided by the embodiment of the present invention obtains the target image; the target image is input into the network model, and the backbone network of the network model includes a Swin Transformer network backbone network for predicting the damage position coordinates and damage of the target image based on the Swin Transformer network category; determine the damage detection result according to the coordinates of the damage location and the damage category.
  • the embodiment of the present invention uses the Swin Transformer network as the backbone network, which is more accurate than the CNN detection method, and can more effectively locate and identify the damaged part.
  • Swin Transformer as the backbone network to extract features can explore the spatial information connection between each pixel of the image and the weighted selection of features, so as to achieve better feature extraction and utilization.
  • Swin Transformer has the characteristics of CNN's locality, translation invariance and residual learning, so it can surpass the CNN method in performance and solve the problems of complicated calculation and large memory consumption in other visual Transformer solutions.
  • the method based on the self-attention mechanism of the Swin Transformer block in Swin Transformer has the advantages of a wide range of application detection models, suitable for on-site environments and complex photographing backgrounds, and can realize efficient damage determination of vehicle damage parts and optimize the efficiency of damage determination.
  • Fig. 4 is the flow chart of the vehicle loss detection method that the second embodiment of the present invention provides, as the further explanation to above-mentioned embodiment, before step 110 obtains target image, also comprise the step that Swin Transformer network is trained.
  • Embodiment 1 provides an implementation manner in which the Swin Transformer network is used as the backbone network for vehicle damage detection.
  • Embodiment 2 is used to provide a training method for the above-mentioned network. This method can be implemented by:
  • Step 210 mark the historical pictures of car damage according to the labeling criteria, and configure the damage categories of the historical pictures of car damage.
  • the damage category and labeling criteria can be determined by the damage assessment personnel and algorithm engineers.
  • the damage categories include varying degrees of severity of vehicle damage for which compensation is required.
  • the labeling criteria include the labeling criteria for special situations such as various damage overlaps, uncertain whether it is damage, and uncertain what kind of damage it is. Damage categories include: scratches, scrapes, dents, wrinkles, dead folds, tears, missing, etc.
  • the historical pictures of body damage are marked in batches.
  • manual labeling can be performed.
  • the damage form appearing in each picture is marked with a rectangular frame, and the type of damage to which it belongs is recorded. Further, images that are difficult to distinguish between damage categories are removed to construct a body damage database.
  • Step 220 train the Swin Transformer network according to the marked car damage history pictures.
  • a part of images from the body damage database is used as a training set, and another part of images is used as a test set.
  • the training process includes taking parameters such as the car damage image and the label of the damage type as input to train the Swin Transformer network. Every epoch is tested on the test set, and the highest model parameters of the detection model map are saved respectively.
  • the Swin Transformer network is optimized after several iterations.
  • train the Swin Transformer network according to the marked car damage history pictures including:
  • the regression calculation of the Swin Transformer network is performed according to the distance penalty damage function.
  • IOU also known as Intersection over Union
  • IOU represents the ratio of the intersection and union of the "predicted border” and the "real border”.
  • the network is trained using the IOU calculation formula and the bounding box positioning loss function.
  • the accuracy rate obtained using the above calculation method is low. Therefore, in the embodiment of the present application, the regression calculation of the Swin Transformer network is performed according to the distance penalty damage function, thereby improving the positioning accuracy of the predicted mine.
  • the DIOUloss loss function can still provide the moving direction for the bounding box when it does not overlap with the target box.
  • DIoU loss has faster convergence speed. At the same time, for the case of containing two boxes in the horizontal direction and vertical direction, the DIoU loss can achieve fast regression.
  • the distance penalized loss function (DIoU Loss) is used for the bounding box regression calculation of the Swin Transformer network.
  • the distance penalty damage L DIoU can be calculated by the following formula:
  • b and b gt represent the center points of the predicted frame and the real frame, respectively, and ⁇ 2 (b, b gt ) represents the calculation of the Euclidean distance between the two center points.
  • C represents the diagonal distance of the minimum closure area that can contain both the predicted box and the ground truth box.
  • IoU represents the intersection and union ratio of the predicted frame and the real frame.
  • train the Swin Transformer network according to the marked car damage history pictures including:
  • data enhancement is performed according to the historical pictures of car damage; the Swin Transformer network is trained using the historical pictures of car damage after data enhancement.
  • the multi-scale training method is used to train enough epochs to converge the loss values of the model in the training set and test set, and save the model parameters with the highest map of the network on the test set.
  • this process is called an epoch.
  • Step 230 acquire the target image.
  • Step 240 input the target image into the network model, the backbone network of the network model includes the Swin Transformer network backbone network for predicting the damage position coordinates and damage category of the target image based on the Swin Transformer network.
  • Step 250 Determine the damage detection result according to the coordinates of the damage location and the damage category.
  • the vehicle loss detection method provided in the embodiment of the present application can train the network more efficiently and make the trained network more accurate.
  • FIG. 5 is a schematic structural diagram of a vehicle loss detection device provided by Embodiment 3 of the present invention. This embodiment is applicable to the situation of vehicle loss detection.
  • the method can be performed by electronic equipment, and the electronic equipment can be a computer device or a terminal, specifically including : an image acquisition module 310 , a detection module 320 and a detection result determination module 330 .
  • An image acquisition module 310 configured to acquire a target image
  • the detection module 320 is used to input the target image to the network model, the backbone network of the network model includes the Swin Transformer network, and the backbone network is used to predict the damage position coordinates and damage category of the target image based on the Swin Transformer network;
  • the detection result determination module 330 is configured to determine the damage detection result according to the damage position coordinates and damage type.
  • the detection module 320 is used for:
  • the convolutional data is used as the input of the Swin Transformer network.
  • the Swin Transformer network includes a plurality of Swin Transformer blocks, and the Swin Transformer block includes a plurality of MSA layers;
  • the input of the MSA layer is provided with a first convolutional layer
  • the output of the MSA layer is provided with a second convolutional layer.
  • the input of the MSA layer is provided with a 1*1 convolutional layer
  • the output of the MSA layer is provided with a 1*1 convolutional layer.
  • the backbone network is connected to the neck network, and the neck network includes:
  • a training module is also included.
  • the training modules are used to:
  • the historical pictures of car damage are marked, and the damage categories of the historical pictures of car damage are configured;
  • the Swin Transformer network is trained according to the marked car damage history pictures.
  • the training module is used for:
  • the regression calculation of the Swin Transformer network is performed according to the distance penalty damage function.
  • the training module is used for:
  • data enhancement is carried out according to the historical pictures of the car damage
  • the Swin Transformer network is trained using data-augmented car damage history pictures.
  • the image acquisition module 310 acquires a target image; the detection module 320 inputs the target image into a network model, and the backbone network of the network model includes a Swin Transformer network.
  • the backbone network is used for based on The Swin Transformer network predicts the damage position coordinates and damage category of the target image; the detection result determination module 330.
  • a damage detection result is determined according to the damage position coordinates and the damage category.
  • the embodiment of the present invention uses the Swin Transformer network as the backbone network, which is more accurate than the CNN detection method, and can more effectively locate and identify the damaged part.
  • Swin Transformer as the backbone network to extract features can explore the spatial information connection between each pixel of the image and the weighted selection of features, so as to achieve better feature extraction and utilization.
  • Swin Transformer has the characteristics of CNN's locality, translation invariance and residual learning, so it can surpass the CNN method in performance and solve the problems of complicated calculation and large memory consumption in other visual Transformer solutions.
  • the method based on the self-attention mechanism of the Swin Transformer block in Swin Transformer has the advantages of a wide range of application detection models, suitable for on-site environments and complex photographing backgrounds, and can realize efficient damage determination of vehicle damage parts and optimize the efficiency of damage determination.
  • the vehicle loss detection device provided in the embodiment of the present invention can execute the vehicle loss detection method provided in any embodiment of the present invention, and has corresponding functional modules and beneficial effects for executing the method.
  • FIG. 6 is a schematic structural diagram of an electronic device provided in Embodiment 4 of the present invention.
  • the electronic device includes a processor 40, a memory 41, an input device 42, and an output device 43;
  • the quantity can be one or more, and a processor 40 is taken as an example in FIG. Take the bus connection as an example.
  • Memory 41 can be used to store software programs, computer-executable programs and modules, such as program instructions/modules corresponding to the vehicle loss detection method in the embodiment of the present invention (for example, in the vehicle loss detection device image acquisition module 310, detection module 320, detection result determination module 330 and training module).
  • the processor 40 executes various functional applications and data processing of the electronic device by running the software programs, instructions and modules stored in the memory 41 , that is, realizes the above-mentioned vehicle loss detection method.
  • the memory 41 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system and at least one application required by a function; the data storage area may store data created according to the use of the terminal, and the like.
  • the memory 41 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage devices.
  • the memory 41 may further include a memory that is remotely located relative to the processor 40, and these remote memories may be connected to the electronic device through a network. Examples of the aforementioned networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
  • the input device 42 can be used to receive input numbers or character information, and generate key signal input related to user settings and function control of the electronic device.
  • the output device 43 may include a display device such as a display screen.
  • the computer program causes the processor to perform the following operations when:
  • the target image is input to the network model, the backbone network of the network model includes a Swin Transformer network, and the backbone network is used to predict damage position coordinates and damage categories of the target image based on the Swin Transformer network;
  • a damage detection result is determined according to the damage position coordinates and the damage category.
  • the processor is configured to input the target image to the network model in the following manner:
  • the convolutional data is used as the input of the Swin Transformer network.
  • the Swin Transformer network processed by the processor is configured as follows: the Swin Transformer network includes a plurality of Swin Transformer blocks, and the Swin Transformer block includes a plurality of MSA layers;
  • the input of the MSA layer is provided with a first convolutional layer
  • the output of the MSA layer is provided with a second convolutional layer.
  • the backbone network processed by the processor is connected to the neck network, and the neck network includes:
  • the processor before acquiring the target image, the processor is further configured to:
  • the historical pictures of car damage are marked, and the damage categories of the historical pictures of car damage are configured;
  • the Swin Transformer network is trained according to the marked car damage history pictures.
  • the processor is configured to train the Swin Transformer network according to the marked car damage history pictures in the following manner:
  • the regression calculation of the Swin Transformer network is performed according to the distance penalty damage function.
  • the processor is configured to train the Swin Transformer network according to the marked car damage history pictures in the following manner:
  • data enhancement is carried out according to the historical pictures of the car damage
  • the Swin Transformer network is trained using data-augmented car damage history pictures.
  • Embodiment 5 of the present invention also provides a storage medium containing computer-executable instructions.
  • the storage medium may be a computer-readable storage medium, and the computer-readable storage medium may be non-volatile or volatile.
  • the computer-executable instructions when executed by a computer processor, are used to perform the following steps:
  • the target image is input to the network model, and the backbone network of the network model includes the Swin Transformer network.
  • the backbone network is used to predict the damage position coordinates and the damage category of the target image based on the Swin Transformer network;
  • a damage detection result is determined according to the damage position coordinates and the damage category.
  • the inputting the target image into the network model includes:
  • the convolutional data is used as the input of the Swin Transformer network.
  • the Swin Transformer network includes a plurality of Swin Transformer blocks, and the Swin Transformer block includes a plurality of MSA layers;
  • the input of the MSA layer is provided with a first convolutional layer; (the input of the MSA layer is provided with a 1*1 convolutional layer)
  • the output of the MSA layer is provided with a second convolutional layer.
  • the input of the MSA layer is provided with a 1*1 convolutional layer
  • the output of the MSA layer is provided with a 1*1 convolutional layer.
  • the backbone network is connected to the neck network, and the neck network includes:
  • the historical pictures of car damage are marked, and the damage categories of the historical pictures of car damage are configured;
  • the Swin Transformer network is trained according to the marked car damage history pictures.
  • the described Swin Transformer network is trained according to the car damage history picture of labeling, including:
  • the regression calculation of the Swin Transformer network is performed according to the distance penalty damage function.
  • the described Swin Transformer network is trained according to the car damage history picture of labeling, including:
  • data enhancement is carried out according to the historical pictures of the car damage
  • the Swin Transformer network is trained using data-augmented car damage history pictures.
  • the computer-executable instructions are not limited to the method operations described above, and may also execute the vehicle loss detection method provided in any embodiment of the present invention. related operations.
  • the present invention can be realized by means of software and necessary general-purpose hardware, and of course it can also be realized by hardware, but in many cases the former is a better implementation mode .
  • the essence of the technical solution of the present invention or the part that contributes to the prior art can be embodied in the form of a software product, and the computer software product can be stored in a computer-readable storage medium, such as a floppy disk of a computer , read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), flash memory (FLASH), hard disk or optical disc, etc., including a number of instructions to make an electronic device (which can be a personal computer) , server, or network device, etc.) execute the methods described in various embodiments of the present invention.
  • a computer-readable storage medium such as a floppy disk of a computer , read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), flash memory (FLASH), hard disk or optical disc
  • the included units and modules are only divided according to functional logic, but are not limited to the above-mentioned division, as long as the corresponding functions can be realized; in addition, The specific names of the functional units are only for the convenience of distinguishing each other, and are not used to limit the protection scope of the present invention.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

一种车辆损失检测方法、装置、电子设备及存储介质。方法包括:获取目标图像;将目标图像输入至网络模型,网络模型的主干网络包括Swin Transformer网络,主干网络基于Swin Transformer网络,用于预测目标图像的损伤位置坐标及损伤类别;根据损伤位置坐标及损伤类别确定损伤检测结果。使用Swin Transformer网络作为主干网络,相对于CNN检测方式更加精确,能够更有效的定位和识别损伤部位。采用Swin Transformer作为主干网络提取特征能够探索图像各像素间的空间信息联系以及对特征的加权选择,从而实现更好的特征提取和利用。同时Swin Transformer具备CNN的局部性、平移不变性以及残差学习等特点,因此能够在性能超越CNN方法的同时又解决了其他视觉Transformer方案中计算量繁杂、内存消耗大的问题。

Description

车辆损失检测方法、装置、电子设备及存储介质
本申请要求于 20210816日提交中国专利局、申请号为 202110937282.8,发明名称为“ 车辆损失检测方法、装置、电子设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明实施例涉及人工智能领域机器学习技术,尤其涉及一种车辆损失检测方法、装置、电子设备及存储介质。
背景技术
随着社会迅速发展,车辆已成为不可或缺的交通工具之一,而日益增加的车辆无疑提高了交通意外的发生率。交通意外发生后通常是保险公司到事故现场进行定损,即通过观察现场拍摄的照片确定车辆损伤,以此作为车险公司的理赔依据。由于定损环节耗费大量的人力资源,且得到的结果具有较强的主观性。故基于深度学习方法实现车辆损伤检测系统开始逐渐代替人工操作,其可以通过一张或多张图片准确地检测出车辆损伤类型。
发明人发现,现有的目标检测器主要是基于CNN实现。但基于CNN进行图像分析的过程存在不够精确的问题。
发明内容
本发明提供一种车辆损失检测方法、装置、电子设备及存储介质,以实现提高车辆损伤检测的精确度。
第一方面,本发明实施例提供了一种车辆损失检测方法,包括:
获取目标图像;
将所述目标图像输入至网络模型,所述网络模型的主干网络包括Swin Transformer网络(又称层次化视觉transformer网络),所述主干网络用于基于Swin Transformer网络,预测目标图像的损伤位置坐标及损伤类别;
根据所述损伤位置坐标及损伤类别确定损伤检测结果。
第二方面,本发明实施例还提供了一种车辆损失检测装置,包括:
图像获取模块,用于获取目标图像;
检测模块,用于将所述目标图像输入至网络模型,所述网络模型的主干网络包括Swin Transformer网络所述主干网络用于基于Swin Transformer网络,预测目标图像的损伤位置坐标及损伤类别;
检测结果确定模块,用于根据所述损伤位置坐标及损伤类别确定损伤检测结果。
第三方面,本发明实施例还提供了一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其中,当所述计算机程序时使得所述处理器执行以下 操作:
获取目标图像;
将所述目标图像输入至网络模型,所述网络模型的主干网络包括Swin Transformer网络,所述主干网络用于基于Swin Transformer网络,预测目标图像的损伤位置坐标及损伤类别;
根据所述损伤位置坐标及损伤类别确定损伤检测结果。
第四方面,本发明实施例还提供了一种包含计算机可执行指令的存储介质,所述计算机可执行指令在由计算机处理器执行时用于执行如下步骤:
获取目标图像;
将所述目标图像输入至网络模型,所述网络模型的主干网络包括Swin Transformer网络,所述主干网络用于基于Swin Transformer网络,预测目标图像的损伤位置坐标及损伤类别;
根据所述损伤位置坐标及损伤类别确定损伤检测结果。
本发明实施例提供的车辆损失检测方法,获取目标图像;将所述目标图像输入至网络模型,所述网络模型的主干网络包括Swin Transformer网络所述主干网络用于基于Swin Transformer网络,预测目标图像的损伤位置坐标及损伤类别;根据所述损伤位置坐标及损伤类别确定损伤检测结果。相对于目前使用CNN进行车损检测不够精确,本发明实施例使用Swin Transformer网络作为主干网络,相对于CNN检测方式更加精确,能够更有效的定位和识别损伤部位。采用Swin Transformer作为主干网络提取特征能够探索图像各像素间的空间信息联系以及对特征的加权选择,从而实现更好的特征提取和利用。同时Swin Transformer具备CNN的局部性、平移不变性以及残差学习等特点,因此能够在性能超越CNN方法的同时又解决了其他视觉Transformer方案中计算量繁杂、内存消耗大的问题。Swin Transformer中的Swin Transformer块基于自注意力机制的方法具有应用检测车型范围广,适用现场环境及拍照背景复杂的优点,能实现车辆损伤部位的高效定损,优化定损效率。
附图说明
图1是本发明实施例一中的车辆损失检测方法的流程图;
图2是本发明实施例一中的Swin Transformer网络的结构示意图;
图3是本发明实施例一中的Swin Transformer块的结构示意图;
图4是本发明实施例二中的车辆损失检测方法的流程图;
图5是本发明实施例三中车辆损失检测装置的结构示意图;
图6是本发明实施例四中的电子设备的结构示意图。
具体实施方式
下面结合附图和实施例对本发明作进一步的详细说明。可以理解的是,此处所描述的具体实施例仅仅用于解释本发明,而非对本发明的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与本发明相关的部分而非全部结构。
实施例一
图1为本发明实施例一提供的车辆损失检测方法的流程图,本实施例可适用于车辆损失检测的情况,该方法可以由电子设备来执行,电子设备可以为计算机设备或终端,具体包括如下步骤:
步骤110、获取目标图像。
目标图像为进行车辆损失检测的图像。用户可以通过手持终端对受损车辆进行拍照,将拍照得到的照片作为目标图像。也可以将预先拍摄的图像导入到计算机设备,作为目标图像。
步骤120、将目标图像输入至网络模型,网络模型的主干网络包括Swin Transformer网络主干网络用于基于Swin Transformer网络,预测目标图像的损伤位置坐标及损伤类别。
Swin Transformer网络的结构图如图2所示,包括块分割层(patch partition)和四个阶段stage。每个阶段stage包括线性嵌入层(linear embedding)和Swin Transformer块(block)。每个阶段用于进行一次降采样。
示例性的,通过块分割层(patch partition)将输入的目标图像224*224,划分为不重合的块patch集合,其中每个块patch尺寸为4x4,目标图像具有3个颜色通道,每个patch的特征维度为4x4x3=48,patch块的数量为H/4x W/4。
阶段1部分(stage1),先通过一个线性嵌入层(linear embedding)将划分后的patch特征维度变成C,然后送入Swin Transformer Block;stage2-stage4操作相同,先通过一个patch merging,将输入按照2x2的相邻块patches合并,得到patch块的数量就变成了H/8x W/8,特征维度就变成了4C,以此类推,通过四个阶段对目标图像的特征向量进行处理,得到车损类别和损坏的位置信息。在Swin Transformer网络中,预先配置每个块patch的大小,根据确定的patch的大小确定块patch的数量。
分割层用于将图像分割为多个块patch,并得到每个块的特征向量。阶段1至阶段4用于根据特征向量进行图像识别,得到目标图像的损伤位置坐标及损伤类别。阶段1以块为单位,识别每个块中的目标图像的特征向量。阶段2对阶段1中的快进行合并,得到快patch的数量为H/8x W/8,根据合并后的块,识别每个块中的目标图像的特征向量。以此类推,后一个阶段对前一个阶段的块进行合并,根据合并得到的块patch识别目标图像的特征向量。阶段4得到目标图像的特征向量后,将特征向量映射到神经网络进行图像识别。
可选的,将目标图像输入至网络模型,包括:通过卷积层对图像进行卷积,得到卷积数据;将卷积数据作为Swin Transformer网络的输入。
可选的,在块分割层(patch partition)之前设置卷积层,通过卷积层对目标图像进行卷积操作。示例性的,配置两层3乘3的卷积层,使用两层3乘3的卷积层对目标图像进行卷积,将目标图像转换为卷积数据。将该卷积数据输入至块分割层(patch partition)。
使用卷积层对图像进行卷积,不仅能够降低后续计算复杂度,还能够提升模型精度。使用两层3乘3卷积层能够进一步提高卷积效率。
将卷积数据输入至块分割层(patch partition)后,通过块分割层(patch partition)将输 入的卷积数据划分为不重合的块patch集合作为Swin Transformer网络的输入特征。
作为主干backbone的Swin Transformer网络,通过各阶段中的Swin Transformer block堆叠而成。输入特征通过线性嵌入层(linear embedding)进行特征维度的变换。Swin Transformer网络通过对输入按照相邻patches进行合并,实现特征的复用。
如图3所示,每个Swin Transformer块(Swin Transformer block)由一个带两层MLP(Muti-Layer Perception)的基于位移窗口的MSA(multi-head self attention)组成。在每个MSA模块和每个MLP之前使用LayerNorm(LN)层,并在每个MSA和MLP之后使用残差连接。MSA模块将输入图片划分为不重合的窗口,然后在不同的窗口内进行自注意力的计算,其计算复杂度和图像尺寸呈线性关系。
可选的,Swin Transformer网络包括多个Swin Transformer块,Swin Transformer块中包括多个MSA层;
MSA层的输入设有第一卷积层;MSA层的输出设有第二卷积层。
对于每个MSA层,在其输入设置第一卷积层,用于降维。在其输出设置第二卷积层,用于升维。示例性的,第一卷积层可以为1*1卷积层。第二卷积层可以为1*1卷积层。相应的,MSA层的输入设有1*1卷积层;MSA层的输出设有1*1卷积层。通过在每个MSA层的输入和输出设置卷积层,能够提高特征运算效率,提高运算速度。对于每个MSA层,在其输入设置1*1卷积层,用于降维。在其输出设置1*1卷积层,用于升维。
可选的,主干网络与颈部网络连接,颈部网络包括:
特征图金字塔网络(Feature Pyramid Networks,FPN)和平衡特征金字塔网络(Balanced Feature Pyramid,BFP)。
特征图金字塔网络用于对每一种尺度的图像进行特征提取,能够产生多尺度的特征表示,并且所有等级的特征图都具有较强的语义信息,甚至包括一些高分辨率的特征图。
阶段1至阶段4的图像有大小进行卷积,相当于特征金字塔网络底层至顶层,特征金字塔网络针对每层的图像进行特征提取,产生多尺度的特征表示,并对特征进行融合。各层的图像具有一定的语义信息。可以通过特征图金字塔网络进行特征融合。平衡特征金字塔网络用于增强多层特征层通过深度集成均衡语义特征。通过平衡特征金字塔网络对特征进行增强。
颈部网络用语连接主干网络backbone与头部网络head,使主干网络输出的特征能够更加高效的应用于头部网络,提高数据处理效率。
步骤130、根据损伤位置坐标及损伤类别确定损伤检测结果。
步骤120Swin Transformer网络通过前向传播输出损伤位置坐标及损伤类别后,可以通过soft-NMS(非极大值抑制)算法筛选出最终损伤检测结果。
本发明实施例提供的车辆损失检测方法,获取目标图像;将目标图像输入至网络模型,网络模型的主干网络包括Swin Transformer网络主干网络用于基于Swin Transformer网络,预测目标图像的损伤位置坐标及损伤类别;根据损伤位置坐标及损伤类别确定损伤检测结果。相对于目前使用CNN进行车损检测不够精确,本发明实施例使用Swin Transformer网络作为 主干网络,相对于CNN检测方式更加精确,能够更有效的定位和识别损伤部位。采用Swin Transformer作为主干网络提取特征能够探索图像各像素间的空间信息联系以及对特征的加权选择,从而实现更好的特征提取和利用。同时Swin Transformer具备CNN的局部性、平移不变性以及残差学习等特点,因此能够在性能超越CNN方法的同时又解决了其他视觉Transformer方案中计算量繁杂、内存消耗大的问题。Swin Transformer中的Swin Transformer块基于自注意力机制的方法具有应用检测车型范围广,适用现场环境及拍照背景复杂的优点,能实现车辆损伤部位的高效定损,优化定损效率。
实施例二
图4为本发明实施例二提供的车辆损失检测方法的流程图,作为对上述实施例的进一步说明,在步骤110获取目标图像之前,还包括对Swin Transformer网络进行训练的步骤。实施例一提供了一种以Swin Transformer网络为主干网络进行车损检测的实施方式。实施例二用于提供上述网络的训练方式。该方法可通过下述方式实施:
步骤210、根据标注准则对车损历史图片进行标注,配置车损历史图片的损伤类别。
其中,损伤类别和标注准则可以由定损人员和算法工程师合议后确定。损伤类别包括需要赔偿的不同严重程度的车辆损伤。标注准则包括各种损伤重叠、不确定是否为损伤、不确定为何种损伤等特殊情况标注准则。损伤类别包括:划痕、刮擦、凹陷、褶皱、死折、撕裂、缺失等。
基于损伤类别对车身损伤的历史图片批量进行标注。可选的,可以进行人工标注。对每张图片里出现的损伤形态采用矩形框标注,并记录其所属损伤类型。进一步的,将难以区分损伤类别的图片剔除,构建车身损伤数据库。
步骤220、根据标注的车损历史图片对Swin Transformer网络进行训练。
可选的,从车身损伤数据库中将一部分图像作为训练集,另一部分图像作为测试集。
对训练集所有图片进行随机裁剪,随机旋转,随机改变饱和度,色调和对比度等数据增强操作,然后将图片缩放到896*896像素,再输入到Swin Transformer进行训练。训练过程包括将车损图像以及损伤类型的标注等参数作为输入,对Swin Transformer网络进行训练。每1个时期(epoch)在测试集上进行测试,分别保存检测模型map最高的一次模型参数。经过多次迭代对Swin Transformer网络进行优化。
可选的,根据标注的车损历史图片对Swin Transformer网络进行训练,包括:
训练过程中,根据距离惩罚损伤函数进行Swin Transformer网络的回归计算。
IOU又称交并比(Intersection over Union),表示“预测的边框”和“真实的边框”的交集和并集的比值。通常对网络进行训练采用IOU计算公式以及bounding box定位损失函数。然而,使用上述计算方式得到的准确率较低。因此,本申请实施例根据距离惩罚损伤函数进行Swin Transformer网络的回归计算,从而提高预测矿的定位精度。DIOUloss损失函数在与目标框不重叠时,仍然可以为边界框提供移动方向。此外,相对于IOU loss,DIoU loss具 有更快的收敛速度。同时,对于包含两个框在水平方向和垂直方向上这种情况,DIoU损失可以实现快速回归。
示例性的,距离惩罚损伤函数(DIoU Loss)用于进行Swin Transformer网络的边界框回归计算。距离惩罚损伤L DIoU可以通过下述公式计算:
Figure PCTCN2022070984-appb-000001
其中b和b gt分别表示预测框和真实框的中心点,ρ 2(b,b gt)表示计算两个中心点间的欧式距离。C表示能够同时包含预测框和真实框的最小闭包区域的对角线距离。IoU表示预测框和真实框的交并比。
可选的,根据标注的车损历史图片对Swin Transformer网络进行训练,包括:
训练过程中,根据车损历史图片进行数据增强;使用数据增强后的车损历史图片对Swin Transformer网络进行训练。
训练过程中,可以根据车损历史图片采用不同数据增强方法,包括通过尝试不同类型的优化器、采用学习率下降策略、正则化技术等方式。此外,采用多尺度训练方式训练足够多的时期epoch使模型在训练集和测试集的损失值收敛,保存网络在测试集上map最高的模型参数。其中,当一个完整的数据集通过了神经网络一次并且返回了一次,这个过程称为一次时期epoch。
此外,少量针对性数据增强包括马赛克和暗光会发生误检,因此在数据增强中随机加入马赛克和图像饱和度变化。
步骤230、获取目标图像。
步骤240、将目标图像输入至网络模型,网络模型的主干网络包括Swin Transformer网络主干网络用于基于Swin Transformer网络,预测目标图像的损伤位置坐标及损伤类别。
步骤250、根据损伤位置坐标及损伤类别确定损伤检测结果。
本申请实施例提供的车辆损失检测方法,能够更加高效的对网络进行训练,使训练出的网络更加精准。
实施例三
图5为本发明实施例三提供的车辆损失检测装置的结构示意图,本实施例可适用于车辆损失检测的情况,该方法可以由电子设备来执行,电子设备可以为计算机设备或终端,具体包括:图像获取模块310、检测模块320和检测结果确定模块330。
图像获取模块310,用于获取目标图像;
检测模块320,用于将所述目标图像输入至网络模型,所述网络模型的主干网络包括Swin Transformer网络所述主干网络用于基于Swin Transformer网络,预测目标图像的损伤位置坐标及损伤类别;
检测结果确定模块330,用于根据所述损伤位置坐标及损伤类别确定损伤检测结果。
在上述实施方式的基础上,检测模块320用于:
通过卷积层对图像进行卷积,得到卷积数据;
将所述卷积数据作为Swin Transformer网络的输入。
在上述实施方式的基础上,所述Swin Transformer网络包括多个Swin Transformer块,所述Swin Transformer块中包括多个MSA层;
所述MSA层的输入设有第一卷积层;
所述MSA层的输出设有第二卷积层。
具体的,所述MSA层的输入设有1*1卷积层,所述MSA层的输出设有1*1卷积层。
在上述实施方式的基础上,所述主干网络与颈部网络连接,所述颈部网络包括:
特征图金字塔网络和平衡特征金字塔网络。
在上述实施方式的基础上,还包括训练模块。训练模块用于:
根据标注准则对车损历史图片进行标注,配置所述车损历史图片的损伤类别;
根据标注的车损历史图片对所述Swin Transformer网络进行训练。
在上述实施方式的基础上,训练模块用于:
训练过程中,根据距离惩罚损伤函数进行Swin Transformer网络的回归计算。
在上述实施方式的基础上,训练模块用于:
训练过程中,根据所述车损历史图片进行数据增强;
使用数据增强后的车损历史图片对Swin Transformer网络进行训练。
本发明实施例提供的车辆损失检测装置,图像获取模块310获取目标图像;检测模块320将所述目标图像输入至网络模型,所述网络模型的主干网络包括Swin Transformer网络所述主干网络用于基于Swin Transformer网络,预测目标图像的损伤位置坐标及损伤类别;检测结果确定模块330。根据所述损伤位置坐标及损伤类别确定损伤检测结果。相对于目前使用CNN进行车损检测不够精确,本发明实施例使用Swin Transformer网络作为主干网络,相对于CNN检测方式更加精确,能够更有效的定位和识别损伤部位。采用Swin Transformer作为主干网络提取特征能够探索图像各像素间的空间信息联系以及对特征的加权选择,从而实现更好的特征提取和利用。同时Swin Transformer具备CNN的局部性、平移不变性以及残差学习等特点,因此能够在性能超越CNN方法的同时又解决了其他视觉Transformer方案中计算量繁杂、内存消耗大的问题。Swin Transformer中的Swin Transformer块基于自注意力机制的方法具有应用检测车型范围广,适用现场环境及拍照背景复杂的优点,能实现车辆损伤部位的高效定损,优化定损效率。
本发明实施例所提供的车辆损失检测装置可执行本发明任意实施例所提供的车辆损失检测方法,具备执行方法相应的功能模块和有益效果。
实施例四
图6为本发明实施例四提供的一种电子设备的结构示意图,如图6所示,该电子设备包括处理器40、存储器41、输入装置42和输出装置43;电子设备中处理器40的数量可以是一个或多个,图6中以一个处理器40为例;电子设备中的处理器40、存储器41、输入装置 42和输出装置43可以通过总线或其他方式连接,图6中以通过总线连接为例。
存储器41作为一种计算机可读存储介质,可用于存储软件程序、计算机可执行程序以及模块,如本发明实施例中的车辆损失检测方法对应的程序指令/模块(例如,车辆损失检测装置中的图像获取模块310、检测模块320、检测结果确定模块330和训练模块)。处理器40通过运行存储在存储器41中的软件程序、指令以及模块,从而执行电子设备的各种功能应用以及数据处理,即实现上述的车辆损失检测方法。
存储器41可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序;存储数据区可存储根据终端的使用所创建的数据等。此外,存储器41可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他非易失性固态存储器件。在一些实例中,存储器41可进一步包括相对于处理器40远程设置的存储器,这些远程存储器可以通过网络连接至电子设备。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。
输入装置42可用于接收输入的数字或字符信息,以及产生与电子设备的用户设置以及功能控制有关的键信号输入。输出装置43可包括显示屏等显示设备。
当所述计算机程序时使得所述处理器执行以下操作:
获取目标图像;
将所述目标图像输入至网络模型,所述网络模型的主干网络包括Swin Transformer网络,所述主干网络用于基于Swin Transformer网络,预测目标图像的损伤位置坐标及损伤类别;
根据所述损伤位置坐标及损伤类别确定损伤检测结果。
在上述实施方式的基础上,所述处理器被设置为通过下述方式将所述目标图像输入至网络模型:
通过卷积层对图像进行卷积,得到卷积数据;
将所述卷积数据作为Swin Transformer网络的输入。
在上述实施方式的基础上,所述处理器处理的Swin Transformer网络被设置为:所述Swin Transformer网络包括多个Swin Transformer块,所述Swin Transformer块中包括多个MSA层;
所述MSA层的输入设有第一卷积层;
所述MSA层的输出设有第二卷积层。
在上述实施方式的基础上,所述处理器处理的所述主干网络与颈部网络连接,所述颈部网络包括:
特征图金字塔网络和平衡特征金字塔网络。
在上述实施方式的基础上,所述处理器在获取目标图像之前,还被设置为:
根据标注准则对车损历史图片进行标注,配置所述车损历史图片的损伤类别;
根据标注的车损历史图片对所述Swin Transformer网络进行训练。
在上述实施方式的基础上,所述处理器被设置为通过下述方式根据标注的车损历史图片对所述Swin Transformer网络进行训练:
训练过程中,根据距离惩罚损伤函数进行Swin Transformer网络的回归计算。
在上述实施方式的基础上,所述处理器被设置为通过下述方式根据标注的车损历史图片对所述Swin Transformer网络进行训练:
训练过程中,根据所述车损历史图片进行数据增强;
使用数据增强后的车损历史图片对Swin Transformer网络进行训练。
实施例五
本发明实施例五还提供一种包含计算机可执行指令的存储介质,存储介质可以为计算机可读存储介质,该计算机可读存储介质可以是非易失性,也可以是易失性。所述计算机可执行指令在由计算机处理器执行时用于执行如下步骤:
获取目标图像;
将所述目标图像输入至网络模型,所述网络模型的主干网络包括Swin Transformer网络所述主干网络用于基于Swin Transformer网络,预测目标图像的损伤位置坐标及损伤类别;
根据所述损伤位置坐标及损伤类别确定损伤检测结果。
在上述实施方式的基础上,所述将所述目标图像输入至网络模型,包括:
通过卷积层对图像进行卷积,得到卷积数据;
将所述卷积数据作为Swin Transformer网络的输入。
在上述实施方式的基础上,所述Swin Transformer网络包括多个Swin Transformer块,所述Swin Transformer块中包括多个MSA层;
所述MSA层的输入设有第一卷积层;(所述MSA层的输入设有1*1卷积层)
所述MSA层的输出设有第二卷积层。
具体的,所述MSA层的输入设有1*1卷积层,所述MSA层的输出设有1*1卷积层。
在上述实施方式的基础上,所述主干网络与颈部网络连接,所述颈部网络包括:
特征图金字塔网络和平衡特征金字塔网络。
在上述实施方式的基础上,在获取目标图像之前,还包括:
根据标注准则对车损历史图片进行标注,配置所述车损历史图片的损伤类别;
根据标注的车损历史图片对所述Swin Transformer网络进行训练。
在上述实施方式的基础上,所述根据标注的车损历史图片对所述Swin Transformer网络进行训练,包括:
训练过程中,根据距离惩罚损伤函数进行Swin Transformer网络的回归计算。
在上述实施方式的基础上,所述根据标注的车损历史图片对所述Swin Transformer网络进行训练,包括:
训练过程中,根据所述车损历史图片进行数据增强;
使用数据增强后的车损历史图片对Swin Transformer网络进行训练。
当然,本发明实施例所提供的一种包含计算机可执行指令的存储介质,其计算机可执行指令不限于如上所述的方法操作,还可以执行本发明任意实施例所提供的车辆损失检测方法中 的相关操作。
通过以上关于实施方式的描述,所属领域的技术人员可以清楚地了解到,本发明可借助软件及必需的通用硬件来实现,当然也可以通过硬件实现,但很多情况下前者是更佳的实施方式。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在计算机可读存储介质中,如计算机的软盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、闪存(FLASH)、硬盘或光盘等,包括若干指令用以使得一台电子设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述的方法。
值得注意的是,上述车辆损失检测装置的实施例中,所包括的各个单元和模块只是按照功能逻辑进行划分的,但并不局限于上述的划分,只要能够实现相应的功能即可;另外,各功能单元的具体名称也只是为了便于相互区分,并不用于限制本发明的保护范围。
注意,上述仅为本发明的较佳实施例及所运用技术原理。本领域技术人员会理解,本发明不限于这里所述的特定实施例,对本领域技术人员来说能够进行各种明显的变化、重新调整和替代而不会脱离本发明的保护范围。因此,虽然通过以上实施例对本发明进行了较为详细的说明,但是本发明不仅仅限于以上实施例,在不脱离本发明构思的情况下,还可以包括更多其他等效实施例,而本发明的范围由所附的权利要求范围决定。

Claims (22)

  1. 一种车辆损失检测方法,其中,包括:
    获取目标图像;
    将所述目标图像输入至网络模型,所述网络模型的主干网络包括Swin Transformer网络,所述主干网络用于基于Swin Transformer网络,预测目标图像的损伤位置坐标及损伤类别;
    根据所述损伤位置坐标及损伤类别确定损伤检测结果。
  2. 根据权利要求1所述的方法,其中,所述将所述目标图像输入至网络模型,包括:
    通过卷积层对图像进行卷积,得到卷积数据;
    将所述卷积数据作为Swin Transformer网络的输入。
  3. 根据权利要求1所述的方法,其中,所述Swin Transformer网络包括多个Swin Transformer块,所述Swin Transformer块中包括多个MSA层;
    所述MSA层的输入设有第一卷积层;
    所述MSA层的输出设有第二卷积层。
  4. 根据权利要求1所述的方法,其中,所述主干网络与颈部网络连接,所述颈部网络包括:
    特征图金字塔网络和平衡特征金字塔网络。
  5. 根据权利要求1所述的方法,其中,在获取目标图像之前,还包括:
    根据标注准则对车损历史图片进行标注,配置所述车损历史图片的损伤类别;
    根据标注的车损历史图片对所述Swin Transformer网络进行训练。
  6. 根据权利要求5所述的方法,其中,所述根据标注的车损历史图片对所述Swin Transformer网络进行训练,包括:
    训练过程中,根据距离惩罚损伤函数进行Swin Transformer网络的回归计算。
  7. 根据权利要求5所述的方法,其中,所述根据标注的车损历史图片对所述Swin Transformer网络进行训练,包括:
    训练过程中,根据所述车损历史图片进行数据增强;
    使用数据增强后的车损历史图片对Swin Transformer网络进行训练。
  8. 一种车辆损失检测装置,其中,包括:
    图像获取模块,用于获取目标图像;
    检测模块,用于将所述目标图像输入至网络模型,所述网络模型的主干网络包括Swin Transformer网络,所述主干网络用于基于Swin Transformer网络,预测目标图像的损伤位置坐标及损伤类别;
    检测结果确定模块,用于根据所述损伤位置坐标及损伤类别确定损伤检测结果。
  9. 一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其中,当所述计算机程序时使得所述处理器执行以下操作:
    获取目标图像;
    将所述目标图像输入至网络模型,所述网络模型的主干网络包括Swin Transformer网络, 所述主干网络用于基于Swin Transformer网络,预测目标图像的损伤位置坐标及损伤类别;
    根据所述损伤位置坐标及损伤类别确定损伤检测结果。
  10. 根据权利要求9所述的电子设备,其中,所述处理器被设置为通过下述方式将所述目标图像输入至网络模型:
    通过卷积层对图像进行卷积,得到卷积数据;
    将所述卷积数据作为Swin Transformer网络的输入。
  11. 根据权利要求9所述的电子设备,其中,所述处理器处理的Swin Transformer网络被设置为:所述Swin Transformer网络包括多个Swin Transformer块,所述Swin Transformer块中包括多个MSA层;
    所述MSA层的输入设有第一卷积层;
    所述MSA层的输出设有第二卷积层。
  12. 根据权利要求9所述的电子设备,其中,所述处理器处理的所述主干网络与颈部网络连接,所述颈部网络包括:
    特征图金字塔网络和平衡特征金字塔网络。
  13. 根据权利要求9所述的电子设备,其中,所述处理器在获取目标图像之前,还被设置为:
    根据标注准则对车损历史图片进行标注,配置所述车损历史图片的损伤类别;
    根据标注的车损历史图片对所述Swin Transformer网络进行训练。
  14. 根据权利要求13所述的电子设备,其中,所述处理器被设置为通过下述方式根据标注的车损历史图片对所述Swin Transformer网络进行训练:
    训练过程中,根据距离惩罚损伤函数进行Swin Transformer网络的回归计算。
  15. 根据权利要求13所述的电子设备,其中,所述处理器被设置为通过下述方式根据标注的车损历史图片对所述Swin Transformer网络进行训练:
    训练过程中,根据所述车损历史图片进行数据增强;
    使用数据增强后的车损历史图片对Swin Transformer网络进行训练。
  16. 一种包含计算机可执行指令的存储介质,所述计算机可执行指令在由计算机处理器执行时用于执行如下步骤:
    获取目标图像;
    将所述目标图像输入至网络模型,所述网络模型的主干网络包括Swin Transformer网络,所述主干网络用于基于Swin Transformer网络,预测目标图像的损伤位置坐标及损伤类别;
    根据所述损伤位置坐标及损伤类别确定损伤检测结果。
  17. 根据权利要求16所述的存储介质,其中,所述计算机可执行指令在由计算机处理器执行时,所述将所述目标图像输入至网络模型,通过下述方式执行:
    通过卷积层对图像进行卷积,得到卷积数据;
    将所述卷积数据作为Swin Transformer网络的输入。
  18. 根据权利要求16所述的存储介质,其中,所述计算机可执行指令在由计算机处理器执行时,所述Swin Transformer网络包括多个Swin Transformer块,所述Swin Transformer块中包括多个MSA层;
    所述MSA层的输入设有第一卷积层;
    所述MSA层的输出设有第二卷积层。
  19. 根据权利要求16所述的存储介质,其中,所述计算机可执行指令在由计算机处理器执行时,所述主干网络与颈部网络连接,所述颈部网络包括:
    特征图金字塔网络和平衡特征金字塔网络。
  20. 根据权利要求16所述的存储介质,其中,所述计算机可执行指令在由计算机处理器执行时,在获取目标图像之前执行:
    根据标注准则对车损历史图片进行标注,配置所述车损历史图片的损伤类别;
    根据标注的车损历史图片对所述Swin Transformer网络进行训练。
  21. 根据权利要求20所述的存储介质,其中,所述计算机可执行指令在由计算机处理器执行时,所述根据标注的车损历史图片对所述Swin Transformer网络进行训练,通过下述方式执行:
    训练过程中,根据距离惩罚损伤函数进行Swin Transformer网络的回归计算。
  22. 根据权利要求20所述的存储介质,其中,所述计算机可执行指令在由计算机处理器执行时,所述根据标注的车损历史图片对所述Swin Transformer网络进行训练,通过下述方式执行:
    训练过程中,根据所述车损历史图片进行数据增强;
    使用数据增强后的车损历史图片对Swin Transformer网络进行训练。
PCT/CN2022/070984 2021-08-16 2022-01-10 车辆损失检测方法、装置、电子设备及存储介质 WO2023019875A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110937282.8A CN113657409A (zh) 2021-08-16 2021-08-16 车辆损失检测方法、装置、电子设备及存储介质
CN202110937282.8 2021-08-16

Publications (1)

Publication Number Publication Date
WO2023019875A1 true WO2023019875A1 (zh) 2023-02-23

Family

ID=78491076

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/070984 WO2023019875A1 (zh) 2021-08-16 2022-01-10 车辆损失检测方法、装置、电子设备及存储介质

Country Status (2)

Country Link
CN (1) CN113657409A (zh)
WO (1) WO2023019875A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116343043A (zh) * 2023-03-30 2023-06-27 南京审计大学 一种具有多尺度特征融合功能的遥感影像变化检测方法
CN117611600A (zh) * 2024-01-22 2024-02-27 南京信息工程大学 一种图像分割方法、系统、存储介质及设备
CN118537708A (zh) * 2024-07-26 2024-08-23 四川航空股份有限公司 基于改进卷积神经网络的孔探图像损伤识别模型及应用系统
CN118552910A (zh) * 2024-07-29 2024-08-27 国网山东省电力公司嘉祥县供电公司 基于红外图像的电力变压器运行状态实时监测方法及系统

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113657409A (zh) * 2021-08-16 2021-11-16 平安科技(深圳)有限公司 车辆损失检测方法、装置、电子设备及存储介质
CN114152441A (zh) * 2021-12-13 2022-03-08 山东大学 基于移位窗口变换器网络的滚动轴承故障诊断方法及系统
CN114627292B (zh) * 2022-03-08 2024-05-14 浙江工商大学 工业遮挡目标检测方法
CN114898155B (zh) * 2022-05-18 2024-05-28 平安科技(深圳)有限公司 车辆定损方法、装置、设备及存储介质
CN114972771B (zh) * 2022-06-22 2024-06-28 平安科技(深圳)有限公司 车辆定损理赔方法、装置、电子设备及存储介质

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110728236A (zh) * 2019-10-12 2020-01-24 创新奇智(重庆)科技有限公司 车辆定损方法及其专用设备
CN111667011A (zh) * 2020-06-08 2020-09-15 平安科技(深圳)有限公司 损伤检测模型训练、车损检测方法、装置、设备及介质
CN113657409A (zh) * 2021-08-16 2021-11-16 平安科技(深圳)有限公司 车辆损失检测方法、装置、电子设备及存储介质

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109657716B (zh) * 2018-12-12 2020-12-29 中汽数据(天津)有限公司 一种基于深度学习的车辆外观损伤识别方法
CN111666990A (zh) * 2020-05-27 2020-09-15 平安科技(深圳)有限公司 车辆损伤特征检测方法、装置、计算机设备及存储介质
CN112966709B (zh) * 2021-01-27 2022-09-23 中国电子进出口有限公司 一种基于深度学习的精细车型识别方法及系统

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110728236A (zh) * 2019-10-12 2020-01-24 创新奇智(重庆)科技有限公司 车辆定损方法及其专用设备
CN111667011A (zh) * 2020-06-08 2020-09-15 平安科技(深圳)有限公司 损伤检测模型训练、车损检测方法、装置、设备及介质
CN113657409A (zh) * 2021-08-16 2021-11-16 平安科技(深圳)有限公司 车辆损失检测方法、装置、电子设备及存储介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ZE LIU; YUTONG LIN; YUE CAO; HAN HU; YIXUAN WEI; ZHENG ZHANG; STEPHEN LIN; BAINING GUO: "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 25 March 2021 (2021-03-25), 201 Olin Library Cornell University Ithaca, NY 14853 , XP081916852 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116343043A (zh) * 2023-03-30 2023-06-27 南京审计大学 一种具有多尺度特征融合功能的遥感影像变化检测方法
CN116343043B (zh) * 2023-03-30 2023-11-21 南京审计大学 一种具有多尺度特征融合功能的遥感影像变化检测方法
CN117611600A (zh) * 2024-01-22 2024-02-27 南京信息工程大学 一种图像分割方法、系统、存储介质及设备
CN117611600B (zh) * 2024-01-22 2024-03-29 南京信息工程大学 一种图像分割方法、系统、存储介质及设备
CN118537708A (zh) * 2024-07-26 2024-08-23 四川航空股份有限公司 基于改进卷积神经网络的孔探图像损伤识别模型及应用系统
CN118552910A (zh) * 2024-07-29 2024-08-27 国网山东省电力公司嘉祥县供电公司 基于红外图像的电力变压器运行状态实时监测方法及系统

Also Published As

Publication number Publication date
CN113657409A (zh) 2021-11-16

Similar Documents

Publication Publication Date Title
WO2023019875A1 (zh) 车辆损失检测方法、装置、电子设备及存储介质
WO2019144575A1 (zh) 一种快速行人检测方法及装置
CN113609896B (zh) 基于对偶相关注意力的对象级遥感变化检测方法及系统
CN111291714A (zh) 一种基于单目视觉和激光雷达融合的车辆检测方法
CN115908442B (zh) 一种无人机海洋监测用图像全景分割方法及模型搭建方法
WO2019169884A1 (zh) 基于深度信息的图像显著性检测方法和装置
Hou et al. BSNet: Dynamic hybrid gradient convolution based boundary-sensitive network for remote sensing image segmentation
CN116912485A (zh) 一种基于热感图像和可见光图像特征融合的场景语义分割方法
CN112232173B (zh) 一种行人属性识别方法、深度学习模型、设备及介质
Tang et al. HIC-YOLOv5: Improved YOLOv5 for small object detection
CN112446292B (zh) 一种2d图像显著目标检测方法及系统
Sun et al. IRDCLNet: Instance segmentation of ship images based on interference reduction and dynamic contour learning in foggy scenes
CN115439743A (zh) 一种泊车场景下精确提取视觉slam静态特征的方法
Shi et al. An improved lightweight deep neural network with knowledge distillation for local feature extraction and visual localization using images and LiDAR point clouds
CN117152443A (zh) 一种基于语义前导指引的图像实例分割方法及系统
CN111626241A (zh) 一种人脸检测方法及装置
Zuo et al. LGADet: Light-weight anchor-free multispectral pedestrian detection with mixed local and global attention
CN116681646A (zh) 基于Yolov5的融合空间信息多头预测小目标检测算法
Wang et al. ATG-PVD: ticketing parking violations on a drone
CN116977359A (zh) 图像处理方法、装置、设备、可读存储介质及程序产品
CN116258931A (zh) 基于ViT和滑窗注意力融合的视觉指代表达理解方法和系统
CN113920455B (zh) 一种基于深度神经网络的夜间视频着色方法
CN115035429A (zh) 一种基于复合主干网络和多预测头的航拍目标检测方法
CN115359067A (zh) 一种基于连续卷积网络的逐点融合点云语义分割方法
Yang et al. Small object detection model for remote sensing images combining super-resolution assisted reasoning and dynamic feature fusion

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 03/06/2024)