WO2021057069A1 - 计算机执行的车辆定损方法及装置 - Google Patents

计算机执行的车辆定损方法及装置 Download PDF

Info

Publication number
WO2021057069A1
WO2021057069A1 PCT/CN2020/093890 CN2020093890W WO2021057069A1 WO 2021057069 A1 WO2021057069 A1 WO 2021057069A1 CN 2020093890 W CN2020093890 W CN 2020093890W WO 2021057069 A1 WO2021057069 A1 WO 2021057069A1
Authority
WO
WIPO (PCT)
Prior art keywords
damage
component
video stream
frame
detection information
Prior art date
Application number
PCT/CN2020/093890
Other languages
English (en)
French (fr)
Inventor
蒋晨
程远
郭昕
Original Assignee
支付宝(杭州)信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 支付宝(杭州)信息技术有限公司 filed Critical 支付宝(杭州)信息技术有限公司
Publication of WO2021057069A1 publication Critical patent/WO2021057069A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/08Insurance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/08Detecting or categorising vehicles

Definitions

  • One or more embodiments of this specification relate to the field of video processing technology, and in particular to methods and devices for processing video streams using machine learning for intelligent vehicle loss assessment.
  • the insurance company needs to send professional damage assessment personnel to the scene of the accident to conduct on-site damage assessment, give the vehicle's maintenance plan and compensation amount, and take photos of the scene and save the damage assessment photos for back-end verification. Personnel nuclear damage check price. Due to the need for manual damage assessment, insurance companies need to invest a lot of labor costs and professional knowledge training costs. From the experience of ordinary users, the claim settlement process waits for the manual surveyor to take pictures on site, the damage assessor assesses the damage at the repair site, and the damage inspector checks the damage in the background. The claim settlement cycle is as long as 1-3 days, and users have a long waiting time. , The experience is poor.
  • One or more embodiments of this specification describe a method and device for intelligent vehicle loss assessment based on video streams, which can comprehensively improve the accuracy of intelligent loss assessment.
  • a computer-implemented vehicle loss assessment method comprising: obtaining a feature matrix of a video stream, the video stream is shot for a damaged vehicle, and the feature matrix includes at least N in the video stream.
  • obtaining the feature matrix of the video stream includes receiving the feature matrix from the mobile terminal.
  • obtaining the feature matrix of the video stream includes: obtaining the video stream; for each of the N image frames, performing component detection through the first component detection model to obtain component detection information, The first damage detection model is used to perform damage detection to obtain damage detection information; at least based on the component detection information and the damage detection information, an M-dimensional vector corresponding to each image frame is formed; Dimensional vector to generate the feature matrix.
  • acquiring the K key frames in the video stream includes: receiving the K key frames from a mobile terminal.
  • the second component detection model is different from the first component detection model, and the second damage detection model is different from the first damage detection model.
  • fusing the component detection information and damage detection information in the N M-dimensional vectors and the K key frame vectors to obtain a comprehensive damage feature of each component includes: determining at least one candidate damaged component , Which includes the first component; for each of the N M-dimensional vectors and the K key frame vectors, by performing intra-frame fusion of the component detection information and damage detection information in a single vector, the first component is obtained.
  • the frame integrated feature of a component is obtained by performing inter-frame fusion of the frame integrated feature of the first component obtained for each vector to obtain the integrated damage feature of the first component.
  • obtaining the preliminary damage result includes receiving the preliminary damage recognition result from the mobile terminal.
  • the feature matrix includes M rows and S columns, where S is not less than N
  • the convolutional neural network includes several one-dimensional convolution kernels
  • the feature matrix is input to a pre-trained convolutional neural network.
  • the network includes: using the several one-dimensional convolution kernels to perform convolution processing on the feature matrix in the row dimension of the feature matrix.
  • the convolutional neural network is trained in the following manner: obtaining a plurality of training samples, where each training sample includes a sample feature matrix of each video stream and a corresponding damage result label, and each video stream
  • the sample feature matrix of at least includes N M-dimensional vectors respectively corresponding to the N image frames in each video stream and arranged according to the time sequence of the N image frames; using the multiple training samples to train the volume Product neural network.
  • the damage result label includes at least one of the following: damage material, damage category, and component category of the damaged component.
  • the method further includes: determining a corresponding replacement and repair plan according to the final loss determination result.
  • a computer-implemented vehicle loss assessment device comprising: a first acquisition unit configured to acquire a feature matrix of a video stream, the video stream being shot for a damaged vehicle, the feature matrix at least including The N M-dimensional vectors respectively corresponding to the N image frames in the video stream and arranged according to the time sequence of the N image frames, each M-dimensional vector includes at least, for the corresponding image frame, through the pre-trained first The component detection information obtained by a component detection model and the damage detection information obtained by the pre-trained first damage detection model; the second acquisition unit is configured to acquire K key frames in the video stream; the generation unit is configured to For the K key frames, generate corresponding K key frame vectors, and each key frame vector includes, for the corresponding key frame image, the component detection information obtained through the pre-trained second component detection model, and the component detection information obtained through the pre-training The damage detection information obtained by the second damage detection model; the fusion unit is configured to fuse the component detection information and damage detection information in the
  • a computer-readable storage medium having a computer program stored thereon, and when the computer program is executed in a computer, the computer is caused to execute the method of the first aspect.
  • a computing device including a memory and a processor, the memory stores executable code, and the processor implements the method of the first aspect when the executable code is executed by the processor.
  • intelligent damage determination is performed based on the video stream generated by shooting the damaged vehicle.
  • the feature matrix of the video stream and the key frame information are merged to obtain a comprehensive damage feature.
  • the feature matrix is input into a pre-trained convolutional neural network to obtain a preliminary damage result, and then the comprehensive damage feature Input the preliminary damage results into the decision-making model to obtain the final damage determination results.
  • Figure 1 is a schematic diagram of a typical implementation scenario of an embodiment disclosed in this specification
  • FIG. 2 shows a flowchart of a method for determining vehicle damage according to an embodiment
  • Figure 3a shows an example of component detection information obtained for a certain image frame
  • Figure 3b shows an example of damage detection information obtained for a certain image frame
  • Fig. 4 shows a schematic diagram of performing convolution on a feature matrix according to a specific example
  • Fig. 5 shows a schematic block diagram of a vehicle damage assessment device according to an embodiment.
  • Intelligent vehicle damage determination mainly involves automatically identifying the damage condition of the vehicle from the pictures of the car damage scene taken by ordinary users.
  • the method commonly used in the industry is to compare the car damage pictures taken by the user with the massive historical database to obtain similar pictures, and determine the to-be-recognized pictures based on the damage assessment results of similar pictures. The damaged parts on the picture and their extent.
  • the accuracy of damage recognition in this way is not ideal.
  • the object detection model of the picture is trained through the machine learning method of supervised training, and the component target and damage target of the vehicle are detected separately by using this model, and then based on the comprehensive analysis of the detection results, the vehicle in the picture is determined Car damage condition.
  • FIG. 1 is a schematic diagram of a typical implementation scenario of an embodiment disclosed in this specification.
  • the user can use a portable mobile terminal, such as a smart phone, to take pictures of the car damage scene and generate a video stream.
  • the mobile terminal can be installed with an application or tool related to loss-based recognition.
  • the application or tool can perform preliminary processing on the video stream, and perform lightweight and preliminary target detection and feature extraction on the N image frames.
  • the target detection result and feature extraction result of the frame can form an M-dimensional vector.
  • the mobile terminal can generate a feature matrix through preliminary processing of the video stream, and the matrix includes at least N M-dimensional vectors.
  • the application in the mobile terminal can also determine the key frames in the video stream, and use the CNN loss-making model to perform preliminary loss-calculation based on the above-generated feature matrix to obtain preliminary damage results.
  • the mobile terminal can send the above-mentioned feature matrix, key frame and preliminary damage result to the server.
  • the server generally has more powerful and reliable computing capabilities. Therefore, the server can use a more complex and more accurate target detection model to perform target detection on the key frames in the video stream again, and then detect vehicle component information. And damage information.
  • the server fuses the information of the feature matrix with the information for key frame detection to generate comprehensive damage features for each component.
  • the above-mentioned preliminary damage results include the damage assessment results for each component.
  • the comprehensive damage characteristics of each component and the above-mentioned preliminary damage results can be input into the pre-trained decision tree model to obtain the final loss assessment result for the video stream, and realize intelligent loss assessment.
  • Fig. 2 shows a flowchart of a method for determining vehicle damage according to an embodiment.
  • the method can be executed by a server, and the server can be embodied as any device, device, platform, and device cluster with computing and processing capabilities.
  • the method at least includes the following steps:
  • Step 21 Obtain a feature matrix of a video stream, the video stream is shot for a damaged vehicle, the feature matrix at least includes corresponding to the N image frames in the video stream, and arranged according to the time sequence of the N image frames
  • Each M-dimensional vector includes at least, for the corresponding image frame, the component detection information obtained by the pre-trained first component detection model, and the damage detection obtained by the pre-trained first damage detection model information;
  • Step 22 Obtain K key frames in the video stream
  • Step 23 Generate corresponding K key frame vectors for K key frames.
  • Each key frame vector includes, for the corresponding key frame image, component detection information obtained through a pre-trained second component detection model, and pre-trained component detection information. Damage detection information obtained by the trained second damage detection model;
  • Step 24 fusing the component detection information and damage detection information in the N M-dimensional vectors and the K key frame vectors to obtain a comprehensive damage feature of each component;
  • Step 25 obtaining a preliminary damage result, the preliminary damage result including the damage result of each component obtained after inputting the feature matrix into a pre-trained convolutional neural network;
  • Step 26 Input the comprehensive damage characteristics and preliminary damage results of the various components into a pre-trained decision tree model to obtain a final damage determination result for the video stream.
  • step 21 the feature matrix of the video stream is obtained.
  • the video stream is generated by the user using the image acquisition device in the mobile terminal, such as a camera, to shoot the damaged vehicle at the car damage scene.
  • the mobile terminal can perform preliminary processing on the video stream through a corresponding application or tool to generate the aforementioned feature matrix.
  • N image frames can be extracted from the video stream, and preliminary processing can be performed on them.
  • the N image frames here may include each image frame in the video stream, may also be image frames extracted at a predetermined time interval (for example, 500 ms), or may be image frames obtained from the video stream according to other extraction methods.
  • target detection and feature extraction are performed on the image frame, thereby generating an M-dimensional vector for each image frame.
  • target detection is used to identify a specific target object from a picture and classify the target object.
  • Various target detection models can be obtained by using image samples marked with target locations and target categories for training.
  • the component detection model and the damage detection model are specific applications of the target detection model.
  • a component detection model can be obtained; when a damaged object on the vehicle is used as a target object for labeling and training, a damage detection model can be obtained.
  • a pre-trained component detection model can be used, here called the first component detection model, to perform component detection on the image frame to obtain component detection information; and use pre-training
  • the damage detection model of, here called the first damage detection model performs damage detection on the image frame to obtain damage detection information.
  • the target detection model is mostly implemented by various detection algorithms based on the convolutional neural network CNN.
  • a variety of lightweight network structures have been proposed, including SqueezeNet, MobileNet, ShuffleNet, Xception, and so on. These lightweight neural network structures use different convolution calculation methods to reduce network parameters, thereby simplifying the convolution calculation of traditional CNN and improving its calculation efficiency.
  • Such a lightweight neural network structure is particularly suitable for running in mobile terminals with limited computing resources.
  • the above-mentioned first component detection model and the first damage detection model are implemented by adopting the above lightweight network structure.
  • the component detection information in the image frame can be obtained.
  • the component detection information may include the component detection frame information of the vehicle component selected from the corresponding image frame, and the component category predicted for the component selected by the frame.
  • the component detection frame information may include the position of the component detection frame, for example, expressed in the form of (x, y, w, h), and the picture convolution information corresponding to the component detection frame extracted from the image frame.
  • the damage detection information in the image frame can be obtained, and the damage detection information may include the damage detection frame information of the damaged object selected from the corresponding image frame. , And the damage category that is predicted for the damage selected in the box.
  • Figure 3a shows an example of component detection information obtained for a certain image frame. It can be seen that Figure 3a includes several component detection boxes, and each component detection box selects a component.
  • the first component detection model can also output a predicted component category corresponding to each component detection frame, for example, the number in the upper left corner of each rectangular frame represents the component category. For example, the number 101 in Figure 3a represents the right front door, 102 represents the right rear door, 103 represents the door handle, and so on.
  • FIG. 3b shows an example of damage detection information obtained for a certain image frame. It can be seen that Figure 3b includes a series of rectangular boxes, which are the damage detection boxes output by the first damage detection model, and each damage detection box selects one damage.
  • the first damage detection model also outputs a predicted damage category corresponding to each damage detection frame. For example, the number in the upper left corner of each rectangular frame represents the damage category.
  • the number 12 represents scratching, and there may be other numbers representing other types of damage.
  • the number 10 represents deformation
  • the number 11 represents tearing
  • the number 13 represents (glass object) fragmentation, etc. .
  • the above-mentioned first component detection model is also used to perform image segmentation on the detected vehicle components to obtain the contour segmentation result of each component in the image frame.
  • image segmentation is to segment or divide an image into areas that belong to/not belong to a specific target object, and its output can be expressed as a mask covering the area of the specific target object.
  • various image segmentation models have been proposed based on various network structures and various segmentation algorithms, such as CRF (Conditional Random Field)-based segmentation models, Mask R-CNN models, and so on.
  • component segmentation can be used to divide a vehicle picture into areas that belong to/not belong to a specific component. Component segmentation can be achieved by using any existing segmentation algorithm.
  • the above-mentioned first component detection model is trained to recognize components (ie, position prediction and category prediction), and to segment components.
  • components ie, position prediction and category prediction
  • a Mask R-CNN-based model can be used as the first component detection model.
  • the model uses two network branches to perform component recognition and component segmentation respectively.
  • the above-mentioned first component detection model includes a first sub-model for component recognition and a second sub-model for component segmentation.
  • the first sub-model outputs the part recognition result
  • the second sub-model outputs the part segmentation result.
  • the obtained component detection information also includes the result of the component segmentation.
  • the result of part segmentation can be embodied as the contour or coverage area of each part.
  • component detection information can be obtained; by performing damage detection on the image frame by the first damage detection model, damage detection information can be obtained. Therefore, an M-dimensional vector corresponding to the image frame can be formed based on the component detection information and the damage detection information.
  • a 60-dimensional vector is formed, where the elements of the first 30 dimensions represent component detection information, and the elements of the last 30 dimensions represent damage detection information.
  • target detection including component detection and damage detection
  • other aspects of feature analysis and extraction are also performed on the image frame, which are included in the above-mentioned M-dimensional vector.
  • the video continuity feature is obtained. This feature can reflect the changes between the image frames, thereby reflecting the stability and continuity of the video, and can also be used to compare the Target tracking.
  • the optical flow change feature of the image frame relative to the previous image frame can be acquired as its continuity feature.
  • the optical flow change can be calculated using some existing optical flow models.
  • the image similarity between the image frame and the previous image frame can be acquired as a continuity feature.
  • image similarity can be measured by SSIM (structural similarity index measurement) index.
  • SSIM index between the current image frame and the previous image frame can be based on the average gray value and gray variance of each pixel in the current image frame, and the average gray value of each pixel in the previous image frame. Gray-scale variance, and the covariance of each pixel in the current image frame and the previous image frame are calculated.
  • the maximum value of the SSIM index is 1. The larger the SSIM index, the higher the structural similarity of the image.
  • the offset feature of several feature points in the image frame relative to the previous image frame can be obtained, and the continuity feature can be determined based on the offset feature.
  • the feature point of the image is a point in the image that has distinctive characteristics, can effectively reflect the essential features of the image, and can identify the target object in the image (for example, the upper left corner of the left front car light).
  • the feature points can be determined by means such as SIFT (Scale-invariant feature transform), LBP (Local Binary Pattern, local binary pattern) and the like. In this way, the changes of two adjacent image frames can be evaluated according to the offset of the feature points.
  • the offset of the feature point can be described by a projective matrix.
  • the transformation matrix w can be used as the projection matrix from the previous image frame to the current image frame. Further, the projection matrix can be used as the continuity feature of the current image frame.
  • the current image frame is the first image frame of the video stream
  • the image frame itself when determining its continuity characteristics, the image frame itself can be used as its previous image frame for comparison, or it can be directly compared.
  • the continuity feature is determined to be a predetermined value, for example, each element of the projection matrix is 1, or the optical flow output is 0, and so on.
  • the image frame quality feature is also obtained, and the feature can reflect the shooting quality of the image frame, that is, the effectiveness of target recognition.
  • the application of a mobile terminal may include a shooting guidance model to guide the user in shooting, such as distance guidance (closer or farther away from the damaged vehicle), angle guidance, and so on.
  • the shooting guidance model will analyze and generate image frame quality characteristics during shooting guidance.
  • the image frame quality feature may include: a feature indicating whether the image frame is blurred, a feature indicating whether the image frame contains a target, a feature indicating whether the image frame is sufficiently illuminated, a feature indicating whether the shooting angle of the image frame is a predetermined angle, and so on.
  • One or more of these features may also be included in the aforementioned M-dimensional vector for each image frame.
  • an 80-dimensional vector is formed, where 1-10 dimensional elements represent image frame quality features, 11-20 dimensional elements represent video continuity features, and 21-50 dimensional vectors
  • the element of represents component detection information, and the element of dimension 51-80 represents damage detection information.
  • the N M-dimensional vectors are arranged according to the time sequence of the N image frames to obtain an N*M-dimensional matrix as the feature matrix of the video stream.
  • the N M-dimensional vectors corresponding to the N image frames are also preprocessed as the feature matrix of the video stream.
  • the preprocessing may include a normalization operation, so as to organize the feature matrix into a fixed dimension.
  • the dimension of the feature matrix is usually preset, and the image frames in the video stream are often extracted at a certain time interval, and the length of the video stream is always changing, and its total length may not be known in advance. Therefore, directly combining the M-dimensional vectors of the actually extracted image frames may not always meet the dimensional requirements of the feature matrix.
  • the dimension of the feature matrix is preset as S frame*M vector. If the number of image frames extracted from the video stream is less than S, the M-dimensional vector corresponding to the extracted image frame can be sorted into an S*M-dimensional matrix by means of filling operation, interpolation operation, pooling operation, etc. In this case, S>N, where N is the number of actually extracted image frames included in the feature matrix.
  • S-N M-dimensional vectors can be supplemented in N M-dimensional vectors by interpolation to obtain a feature matrix with S rows and M columns.
  • the feature matrix of the video stream is generated.
  • the feature matrix includes at least N image frames respectively corresponding to the N image frames in the video stream and arranged according to the sequence of the N image frames.
  • the dimensional vector, each M-dimensional vector includes at least, for the corresponding image frame, component detection information obtained through the pre-trained first component detection model, and damage detection information obtained through the pre-trained first damage detection model.
  • the preliminary processing of the image frame and the generation of the feature matrix are performed by the mobile terminal.
  • the server in step 21, only the feature matrix needs to be received from the mobile terminal.
  • This method is suitable for a situation where an application or tool corresponding to the fixed loss recognition has been installed in the mobile terminal and has a certain computing capability. Since the transmission data volume of the feature matrix is much smaller than the video stream itself, this method is very conducive to network transmission.
  • the mobile terminal transmits the video stream to the server, and the server processes each image frame to generate a feature matrix.
  • the captured video stream is obtained from the mobile terminal, and then image frames are extracted from it, and target detection and feature extraction are performed on the extracted image frames to generate an M-dimensional vector.
  • component detection can be performed through the aforementioned first component detection model to obtain component detection information
  • damage detection can be performed through the first damage detection model to obtain damage detection information; and at least based on the component detection information With the damage detection information, an M-dimensional vector corresponding to each image frame is formed.
  • the feature matrix is generated according to the M-dimensional vector of each of the N image frames.
  • the server In addition to acquiring the feature matrix of the video stream, in step 22, the server also acquires K key frames in the video stream, where K is greater than or equal to 1 and less than the aforementioned number of image frames N.
  • the mobile terminal determines K key frames in the video stream, and sends the key frames to the server. Therefore, as far as the server is concerned, in step 22, only the K key frames need to be received from the mobile terminal.
  • step 22 the server determines the key frame in the video stream.
  • multiple existing key frame determination methods can be used to determine the key frames in the video stream.
  • the image frame with higher comprehensive quality can be determined as the key frame according to the quality characteristics of each image frame; in another embodiment, the relative image frame can be determined according to the continuity characteristics of each image frame. The image frame that has changed a lot from the previous frame is used as the key frame.
  • the determined key frame may be included in the N image frames, or may be different from the aforementioned N image frames.
  • step 23 target detection is performed again on the images of each key frame acquired in step 22. Specifically, it is possible to perform component detection on the key frame image through a pre-trained second component detection model to obtain component detection information, and perform damage detection on it through a pre-trained second damage detection model to obtain damage detection information, based on the Component detection information and damage detection information generate key frame vectors.
  • the second component detection model here may be different from the first component detection model used for the preliminary processing of the image frame to generate the feature matrix.
  • the second component detection model is a more accurate and complex model than the first component detection model, so that more accurate component detection is performed on the key frames in the video stream.
  • the first component detection model is usually a model based on a lightweight network structure, which is suitable for the limited computing power and resources of the mobile terminal; and the second component detection model can be There are higher requirements for computing power and a detection model suitable for the server, so that more complex calculations are performed on image features and more accurate results are obtained.
  • the second damage detection model can also be more complex and accurate than the aforementioned first damage detection model, so as to perform more accurate damage detection on the key frames in the video stream.
  • the second component detection model is also used to segment the components.
  • the part detection information obtained based on the second part detection model further includes information for image segmentation of the parts included in the key frame image, that is, contour information of the parts.
  • the target detection is performed on the key frame image again, and the key frame vector can be formed based on the obtained component detection information and damage detection information.
  • K key frame vectors can be formed.
  • the key frame vector is a 70-dimensional vector, where 1-35-dimensional elements represent component detection information, and 36-70-dimensional elements represent damage detection information.
  • step 24 the component detection information and damage detection information in the N M-dimensional vectors and the K key frame vectors are merged to obtain comprehensive damage characteristics of each component.
  • this step may include: determining at least one candidate damaged component, including the first component; for each of the N M-dimensional vectors and the K key frame vectors, by comparing the The component detection information and the damage detection information are fused within the frame to obtain the frame integrated feature of the first component, and the frame integrated feature of the first component obtained for each vector is inter-frame fused to obtain the first component.
  • each component of the vehicle is used as a candidate damaged component.
  • these 100 types of parts can all be used as candidate damaged parts. There will be no omissions in this way, but there are more redundant calculations and subsequent processing burdens are greater.
  • component detection is performed on the image frame, and the component detection information obtained includes the component category predicted for the component.
  • the component category appearing in the component detection information is used as the candidate damaged component.
  • the first set of candidate components may be determined based on the component detection information in the N M-dimensional vectors.
  • the component detection information of each M-dimensional vector may include several component detection frames and corresponding predicted component categories, and the union of the predicted component categories in the N M-dimensional vectors may be used as the first candidate component set.
  • the second candidate component set that is, the union set of the predicted component categories in each key frame vector.
  • the combined component of the above-mentioned first candidate component set and the second candidate component set is used as the candidate damaged component.
  • a component category appears in the feature matrix of the video stream, or appears in the key frame vector, it means that the component of this category is detected in the N image frames of the video stream, or in the key frame. If the component is detected in, the component of this category can be used as a candidate damaged component.
  • the feature matrix of the video stream and each key frame vector are fused to obtain the comprehensive damage feature of the first component.
  • the fusion operation can include intra-frame fusion (or first fusion) and inter-frame fusion (or second fusion).
  • intra-frame fusion the component detection information and damage detection information in the vector corresponding to each frame are fused to obtain the frame comprehensive characteristics of the first component in the frame; then combined with the timing information, each frame corresponds to each frame through inter-frame fusion
  • the frame integrated features of the first part are fused to obtain the integrated damage feature of the first part. The following describes the process of intra-frame fusion and inter-frame fusion respectively.
  • Intra-frame fusion aims to obtain the component-level damage feature of the first component in a certain frame, which is also called frame synthesis feature.
  • the frame integrated features of the first part may include the part features of the first part in the frame and the damage features related to the first part.
  • the vector corresponding to the frame integrated feature of the first component and the vector corresponding to the damage feature related to the first component can be spliced, and the correspondingly obtained splicing vector can be used as the frame integrated feature of the first component.
  • the foregoing N image frames include a first image frame, which corresponds to a first M-dimensional vector; the foregoing obtaining the frame synthesis feature of the first component includes: from the first M-dimensional vector The first detection information related to the first component is extracted from the component detection information; based on the first detection information, the component feature of the first component in the first image frame is determined as the component feature related to the first component
  • the image frame corresponds to a part of the frame comprehensive feature of the first component.
  • the above-mentioned N image frames further include a second image frame after the first image frame, corresponding to a second M-dimensional vector; each M-dimensional vector further includes, video continuity
  • the above-mentioned obtaining the frame comprehensive feature of the first component also includes: extracting a video continuity feature from the second M-dimensional vector; determining the first detection information and the video continuity feature The second detection information of the component in the second image frame; based on the second detection information, the component feature of the first component in the second image frame is determined as corresponding to the second image frame , Part of the frame comprehensive feature of the first component.
  • the aforementioned video continuity feature includes at least one of the following: optical flow change feature between image frames, similarity feature between image frames, determined based on the projection matrix between image frames The transformation characteristics.
  • the aforementioned N image frames include a first image frame, which corresponds to a first M-dimensional vector;
  • the damage detection information in the first M-dimensional vector includes: The middle frame selects the information of multiple damage detection frames of multiple damaged objects;
  • the above-mentioned obtaining the frame comprehensive characteristics of the first component includes: according to the component detection information in the first M-dimensional vector and the multiple damage detections Frame information, determine at least one damage detection frame that also belongs to the first component; obtain the damage feature of the at least one damage detection frame; perform the first fusion operation on the damage feature of the at least one damage detection frame to obtain the
  • the damage feature related to a component is used as a part of the frame comprehensive feature of the first component corresponding to the first image frame.
  • the component detection information in the first M-dimensional vector includes information on multiple component detection frames of multiple components selected from the first image frame; the determination is that the same belongs to
  • the at least one damage detection frame of the first component includes: determining a first component detection frame corresponding to the first component; and determining that the multiple damage detection frames and the first component detection frame belong to the same At least one damage detection frame of the first component.
  • the component detection information in the first M-dimensional vector includes component segmentation information; the determining at least one damage detection frame that also belongs to the first component includes: according to the component segmentation information , Determine the first area covered by the first component; determine whether it falls in the first area according to the position information of the multiple damage detection frames; determine the damage detection frame falling in the first area as The at least one damage detection frame.
  • the at least one damage detection frame includes a first damage detection frame; the obtaining the damage feature of the at least one damage detection frame includes obtaining the first damage corresponding to the first damage detection frame Features.
  • the first damage feature includes a picture convolution feature related to the first damage detection frame.
  • the damage detection information further includes a predicted damage category corresponding to each damage detection frame in the multiple damage detection frames; and the acquiring the first damage corresponding to the first damage detection frame
  • the feature further includes: determining a first associated feature as a part of the first damage feature according to the association relationship between the first damage detection frame and other damage detection frames in the multiple damage detection frames, and the association relationship It includes at least one or more of the following: the correlation of the position of the damage detection frame, the correlation of the predicted damage category, and the correlation of the frame content reflected by the picture convolution feature.
  • the first fusion operation includes one or more of the following: a maximum operation, a minimum operation, an average operation, a sum operation, and a median operation.
  • the component feature and damage feature of the first component can be obtained separately for the first image frame.
  • the frame comprehensive feature of the first component in the first image frame can be obtained, that is, the first image frame is related to the first image frame. The parts are fused within the frame.
  • the obtained frame comprehensive feature of the first component is the component-level damage feature of the first component in the first image frame.
  • the aforementioned intra-frame fusion can be performed based on its corresponding M-dimensional vector to obtain the frame comprehensive feature of the first component corresponding to the frame, which is recorded as the first vector.
  • N first vectors can be obtained for N image frames, which respectively correspond to the frame synthesis features of the first component obtained for the N image frames.
  • the aforementioned intra-frame fusion can also be performed based on the component detection information and damage detection information in its key frame vector, so that the frame comprehensive feature of the first component of the key frame is obtained. Is the second vector. In this way, K second vectors can be obtained for K key frames, which respectively correspond to the frame synthesis features of the first component obtained for K image frames.
  • the dimensions of the key frame vector may be different from the aforementioned M-dimensional vector, the dimensions of the first vector and the second vector may also be different, but the idea and process of intra-frame fusion are similar.
  • inter-frame fusion is performed on the frame integrated features of the first component obtained for each frame (including N image frames and K key frames), so as to obtain the integrated feature vector of the first component as the integrated damage feature.
  • the N first vectors are first combined to obtain the first combined vector, where the N first vectors respectively correspond to the frame synthesis of the first component obtained for the N image frames Features; Perform a second combination of K second vectors to obtain a second combined vector, where the K second vectors correspond to the frame synthesis features of the first component obtained for the K key frames; The first combination vector and the second combination vector are integrated to obtain the integrated feature vector.
  • the first combination of the N first vectors includes: splicing the N first vectors according to the timing of the corresponding N image frames.
  • the first combination of the N first vectors includes: determining a weighting factor of the N first vectors; and performing the first combination on the N first vectors according to the weighting factor. Weighted combination.
  • the determining the weight factors of the N first vectors includes: for each of the N image frames, determining the closest key in time sequence from the K key frames Frame; according to the time sequence distance between each image frame and its closest key frame, the weight factor of the first vector corresponding to the image frame is determined, so that the time sequence distance is negatively correlated with the weight factor.
  • each of the M-dimensional vectors further includes an image frame quality feature; the determining the weight factors of the N first vectors includes: for each of the N image frames, according to the The image frame quality feature in the M-dimensional vector corresponding to the image frame determines the weight factor of the first vector corresponding to the image frame.
  • the image frame quality feature includes at least one of the following: a feature that indicates whether the image frame is blurred, a feature that indicates whether the image frame contains a target, a feature that indicates whether the image frame is sufficiently illuminated, and a feature that indicates whether the image frame is taken Whether the angle is a feature of a predetermined angle.
  • the comprehensive damage feature of the first component is obtained based on the fusion of N M-dimensional vectors in the feature matrix of the video stream and K key frame vectors. Therefore, it can fully reflect that the first component is in the N of the video stream. Image frames, and the overall damage characteristics in K key frames.
  • step 25 a preliminary damage result is obtained, and the preliminary damage result includes the damage result of each component obtained after inputting the feature matrix into a pre-trained convolutional neural network.
  • the mobile terminal determines the preliminary damage recognition result, and sends the preliminary damage recognition result to the server. Therefore, as far as the server is concerned, in step 25, only the preliminary damage result needs to be received from the mobile terminal. In another embodiment, in step 25, the server determines the preliminary damage result for the video stream.
  • the preliminary damage result is determined by the mobile terminal or the server, it can be obtained based on a pre-trained convolutional neural network.
  • the format of its input matrix is often "batch_size*length*width*number of channels".
  • the channel of the color image is usually three channels of "R”, "G”, and "B", that is, the number of channels is 3.
  • the length and width are independent of each other, and the channels affect each other.
  • the features of different spatial positions of the image should be independent, and the two-dimensional convolution operation has spatial invariance.
  • the image processing process is generally done in the "length * width” dimension, and if the "length * width” is replaced with the number of rows and columns in the feature matrix, in the feature dimension, one of the features at different positions They will affect each other instead of being independent of each other, and it is unreasonable to convolve them.
  • the feature dimension here can correspond to the nature of the channel dimension in image processing.
  • the input format of the feature matrix can be adjusted, such as adjusting to "batch size (batch_size) * 1 * number of columns (such as S or N) * number of rows (M)".
  • batch_size 1 * number of columns (such as S or N) * number of rows (M)
  • M number of rows
  • the convolutional neural network may include one or more convolution processing layers and output layers.
  • the convolution processing layer can be composed of a two-dimensional convolution layer, an activation layer, and a normalization layer, such as 2D convolutional Filter+ReLU+Batch Normalization.
  • the two-dimensional convolution layer can be used to perform convolution processing on the feature matrix through a convolution kernel corresponding to the time dimension.
  • the feature matrix includes M rows and S columns, where S is not less than N
  • the convolutional neural network includes several one-dimensional convolution kernels
  • the feature matrix is input to a pre-trained volume
  • the product neural network includes: using the several one-dimensional convolution kernels to perform convolution processing on the feature matrix in the row dimension of the feature matrix.
  • the convolution operation corresponding to the time dimension may be performed through a convolution kernel such as (1, -1, -1, 1).
  • the convolution kernel can be trained in a targeted manner. For example, one convolution kernel can be trained for each feature.
  • the convolution kernel (1, -1, -1, 1) shown in FIG. 4 is a convolution kernel corresponding to the component damage feature in the car damage detection scene, and so on. In this way, after the convolution operation of each convolution kernel, a feature (such as a component damage feature in a car damage detection scene) can be identified.
  • the activation layer can be used to non-linearly map the output result of the two-dimensional convolutional layer.
  • the activation layer can be implemented by activation functions such as Sigmoid, Tanh (hyperbolic tangent), and ReLU. Through the activation layer, the output result of the two-dimensional convolutional layer is mapped to a non-linear change value between 0-1.
  • the output result after passing through the activation layer may move to the gradient saturation area (the area where the gradient of the activation function is relatively small).
  • the convolutional neural network converges slowly or does not converge due to the decrease or disappearance of the gradient. Therefore, the output result of the activation layer can be further pulled back to the region where the gradient of the excitation function changes significantly through the batch normalization layer.
  • the output layer is used to output preliminary damage results for the video stream.
  • the initial damage result can be obtained by using the convolutional neural network.
  • the video stream marked with the damage recognition result can be used as the training sample.
  • each training sample includes a sample feature matrix of each video stream and a corresponding damage result label
  • the sample feature matrix of each video stream includes at least The N image frames in each video stream correspond to N M-dimensional vectors arranged according to the sequence of the N image frames.
  • the sample feature matrix of each video stream can be obtained based on the method described in step 21.
  • the damage result label includes one or more of the following: damage material, damage category, and component category of the damaged component.
  • the parameters of the model can be adjusted by, for example, a gradient descent method.
  • the loss function in the model training process is, for example, the sum of the squares of the differences between the respective predicted values and the label values of the multiple samples, or the sum of the absolute values of the differences between the respective predicted values and the label values of multiple samples, etc. .
  • the trained convolutional neural network used to obtain preliminary damage results can be further modified to extract key frames. Specifically, it can be used to extract K key frames in step 22 above. Specifically, the above-mentioned trained convolutional neural network can be obtained, and the parameters in the layers other than the output layer in the convolutional neural network can be fixed, and then the video stream marked with key frames can be used as training samples for further training. Adjust the parameters of the output layer, and then obtain the modified convolutional neural network for extracting key frames.
  • step 26 the comprehensive damage feature of each component and the preliminary damage result are input into a pre-trained decision tree model to obtain the final damage determination result for the video stream.
  • the damage result of the component may be extracted from the preliminary damage result obtained in step 25, and the vector corresponding to the damage result can be compared with the component
  • the vector corresponding to the comprehensive damage feature of, is spliced to obtain the spliced vector for the component.
  • the splicing vector of the component can be input into a pre-trained decision tree model to obtain the final loss assessment result for the component. Therefore, by performing this operation for each component, the final loss determination result of each component can be obtained, which constitutes the final loss determination result for the video stream.
  • various existing specific decision tree algorithms such as gradient boosting decision tree GBDT, classification decision tree CRT, etc., can be used in this step as the above decision tree model.
  • the decision tree model is embodied as a two-category model by using two-category labeled samples for pre-training.
  • the labeling sample of the above two categories is that, given a sample video stream, the labeling staff will mark whether each component is damaged (damaged is one category, and undamaged is another category).
  • the pre-training process can include, through the aforementioned steps 21 to 25, based on a given sample video stream, obtain its comprehensive damage characteristics and preliminary damage results for a certain component, and then use a decision tree model based on its comprehensive damage characteristics And the preliminary damage results predict whether the component is damaged. After that, the prediction result is compared with the annotation result of the component, and the model parameters of the decision tree model are adjusted according to the comparison result, so that the prediction result tends to fit the annotation result more. In this way, the decision tree two classification model is obtained.
  • step 26 after inputting the comprehensive damage characteristics and preliminary damage results of a certain component to be analyzed into the decision tree two classification model, the model will output one of the two classifications of whether it is damaged, so you can get The result of whether a component is damaged.
  • the decision tree model is embodied as a multi-classification model by using multi-class labeled samples for pre-training.
  • the above-mentioned multi-class labeling sample is that, given a sample video stream, the labeling personnel mark the damage categories of each component (for example, multiple damage categories such as scratching, deformation, and chipping).
  • the pre-training process can include, through the aforementioned steps 21 to 25, based on a given sample video stream, obtain its comprehensive damage characteristics and preliminary damage results for a certain component, and then use a decision tree model based on its comprehensive damage characteristics And the preliminary damage results predict the damage category of the component. After that, the prediction result is compared with the annotation result of the component, and the model parameters of the decision tree model are adjusted according to the comparison result, so that the prediction result tends to fit the annotation result more. In this way, the decision tree multi-classification model is obtained.
  • step 26 after inputting the comprehensive damage feature vector of a certain component to be analyzed into the decision tree multi-classification model, the model will output the predicted damage classification category, so the classification category can be used to obtain the The damage type of the first component is used as the final damage determination result.
  • a certain component mentioned above is any component among the candidate damaged components.
  • the above process can be performed.
  • the comprehensive damage characteristics are obtained, and the preliminary damage results are obtained in step 25.
  • the damage status is obtained based on the comprehensive damage characteristics and preliminary damage results. . Therefore, the damage status of each candidate damaged component can be obtained, and the final damage determination result of the whole case can be obtained.
  • such a final loss determination result is transmitted back to the mobile terminal.
  • the replacement and repair plan of each component can also be determined accordingly.
  • the staff can set up a mapping table in advance, which records the replacement and repair plans of various types of components under various damage categories.
  • the corresponding replacement plan is spray paint, and when the damage category is deformed, the corresponding replacement plan is sheet metal; for glass-type parts, when the damage category is scratched , The corresponding replacement program is to replace the glass, and so on.
  • the replacement and repair plan can be determined for each damaged component, and the loss determination result and the replacement and repair plan can be transmitted back to the mobile terminal together to achieve a more comprehensive intelligent loss determination.
  • a device for determining vehicle damage is provided.
  • the device can be deployed on a server, and the server can be implemented by any device, platform, or device cluster with computing and processing capabilities.
  • Fig. 5 shows a schematic block diagram of a vehicle damage assessment device according to an embodiment. As shown in FIG. 5, the device 500 includes:
  • the first acquisition unit 510 is configured to acquire a feature matrix of a video stream, the video stream is shot for a damaged vehicle, and the feature matrix includes at least the N image frames corresponding to the N image frames in the video stream, and according to the N image frames.
  • the second acquiring unit 520 is configured to acquire K key frames in the video stream.
  • the generating unit 530 is configured to generate corresponding K key frame vectors for the K key frames, and each key frame vector includes, for the corresponding key frame image, the component detection obtained through the pre-trained second component detection model Information, and damage detection information obtained through the pre-trained second damage detection model.
  • the fusion unit 540 is configured to fuse the component detection information and damage detection information in the N M-dimensional vectors and the K key frame vectors to obtain comprehensive damage characteristics of each component.
  • the third obtaining unit 550 is configured to obtain a preliminary damage result, and the preliminary damage result includes the damage result of each component obtained after inputting the feature matrix into a pre-trained convolutional neural network.
  • the loss determination unit 560 is configured to input the comprehensive damage characteristics of the various components and the preliminary damage result into a pre-trained decision tree model to obtain a final loss determination result for the video stream.
  • the first acquiring unit 510 is specifically configured to receive the feature matrix from a mobile terminal.
  • the first obtaining unit 510 is specifically configured to: obtain the video stream; for each of the N image frames, perform component detection through the first component detection model to obtain the component Detect information, and perform damage detection through the first damage detection model to obtain damage detection information; form an M-dimensional vector corresponding to each image frame based on at least the component detection information and the damage detection information; according to N image frames The respective M-dimensional vectors are used to generate the feature matrix.
  • the second acquiring unit 520 is specifically configured to receive the K key frames from a mobile terminal.
  • the second component detection model is different from the first component detection model, and the second damage detection model is different from the first damage detection model.
  • the fusion unit 540 is specifically configured to: determine at least one candidate damaged component, including the first component; for each of the N M-dimensional vectors and the K key frame vectors , By performing intra-frame fusion of the component detection information and damage detection information in a single vector to obtain the frame integrated features of the first component, and perform inter-frame integration of the frame integrated features of the first component obtained for each vector Fusion to obtain the comprehensive damage characteristics of the first component.
  • the third acquiring unit 550 is specifically configured to receive the preliminary damage identification result from the mobile terminal.
  • the feature matrix includes M rows and S columns, where S is not less than N
  • the convolutional neural network includes several one-dimensional convolution kernels
  • the feature matrix is input to a pre-trained convolutional neural network.
  • the network includes: using the several one-dimensional convolution kernels to perform convolution processing on the feature matrix in the row dimension of the feature matrix.
  • the convolutional neural network is pre-trained by a training unit, and the training unit is specifically configured to obtain a plurality of training samples, wherein each training sample includes a sample feature matrix of each video stream and a corresponding The damage result label of each video stream, the sample feature matrix of each video stream includes at least N M-dimensional vectors respectively corresponding to the N image frames in each video stream and arranged according to the sequence of the N image frames; use The multiple training samples train the convolutional neural network.
  • the damage result label includes at least one of the following: damage material, damage category, and component category of the damaged component.
  • the device further includes: a determining unit 570 configured to determine a corresponding replacement and repair plan according to the final loss determination result.
  • a computer-readable storage medium having a computer program stored thereon, and when the computer program is executed in a computer, the computer is caused to execute the method described in conjunction with FIG. 2.
  • a computing device including a memory and a processor, the memory is stored with executable code, and when the processor executes the executable code, it implements the method described in conjunction with FIG. 2 method.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Business, Economics & Management (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Multimedia (AREA)
  • Development Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Economics (AREA)
  • Technology Law (AREA)
  • Strategic Management (AREA)
  • Marketing (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

本说明书实施例提供一种计算机执行的车辆定损方法,基于对受损车辆进行拍摄产生的视频流进行智能定损。具体的,首先对视频流中的图像帧进行初步的目标检测和特征提取,得到视频流特征矩阵。并且,还对视频流中的关键帧再次进行目标检测,得到关键帧向量。接着,分别针对各个部件,融合视频流特征矩阵和关键帧向量中的特征,生成部件的综合损伤特征。另一方面,还基于视频流特征矩阵进行初步定损,得到初步定损结果。最后,基于初步定损结果和各个部件的综合损伤特征再次进行定损,得到针对视频流的最终定损结果。

Description

计算机执行的车辆定损方法及装置 技术领域
本说明书一个或多个实施例涉及视频处理技术领域,尤其涉及利用机器学习处理视频流以进行车辆智能定损的方法和装置。
背景技术
在传统车险理赔场景中,保险公司需要派出专业的查勘定损人员到事故现场进行现场查勘定损,给出车辆的维修方案和赔偿金额,并拍摄现场照片,定损照片留档以供后台核查人员核损核价。由于需要人工查勘定损,保险公司需要投入大量的人力成本,和专业知识的培训成本。从普通用户的体验来说,理赔流程由于等待人工查勘员现场拍照、定损员在维修地点定损、核损人员在后台核损,理赔周期长达1-3天,用户的等待时间较长,体验较差。
针对需求背景中提到的这一人工成本巨大的行业痛点,开始设想将人工智能和机器学习应用到车辆定损的场景中,希望能够利用人工智能领域计算机视觉图像识别技术,根据普通用户拍摄的现场损失图片,自动识别图片中反映的车损状况,并自动给出维修方案。如此,无需人工查勘定损核损,大大减少了保险公司的成本,提升了普通用户的车险理赔体验。
不过,目前的智能定损方案,对车损进行确定的准确度还有待进一步提高。因此,希望能有改进的方案,能够对车辆损伤的检测结果进行进一步优化,提高识别准确度。
发明内容
本说明书一个或多个实施例描述了基于视频流的车辆智能定损的方法和装置,可以全面提高智能定损的准确性。
根据第一方面,提供一种计算机执行的车辆定损方法,该方法包括:获取视频流的特征矩阵,所述视频流针对损伤车辆而拍摄,所述特征矩阵至少包括与所述视频流中N个图像帧分别对应、且按照所述N个图像帧的时序排列的N个M维向量,每个M维向量至少包括,针对对应的图像帧,通过预先训练的第一部件检测模型得到的部件检测信息,以及通过预先训练的第一损伤检测模型得到的损伤检测信息;获取所述视频流中 的K个关键帧;针对所述K个关键帧,生成对应的K个关键帧向量,每个关键帧向量包括,针对对应的关键帧图像,通过预先训练的第二部件检测模型得到的部件检测信息,以及通过预先训练的第二损伤检测模型得到的损伤检测信息;融合所述N个M维向量和所述K个关键帧向量中的部件检测信息和损伤检测信息,得到各个部件的综合损伤特征;获取初步损伤结果,所述初步损伤结果包括,将所述特征矩阵输入预先训练的卷积神经网络后得到的所述各个部件的损伤结果;将所述各个部件的综合损伤特征和所述初步损伤结果输入预先训练的决策树模型,得到针对所述视频流的最终定损结果。
在一个实施例中,获取视频流的特征矩阵包括,从移动终端接收所述特征矩阵。
在一个实施例中,获取视频流的特征矩阵包括:获取所述视频流;针对所述N个图像帧中的各个图像帧,通过所述第一部件检测模型进行部件检测,得到部件检测信息,并通过所述第一损伤检测模型进行损伤检测,得到损伤检测信息;至少基于所述部件检测信息和所述损伤检测信息,形成各个图像帧对应的M维向量;根据N个图像帧各自的M维向量,生成所述特征矩阵。
在一个实施例中,获取所述视频流中的K个关键帧包括:从移动终端接收所述K个关键帧。
在一个实施例中,所述第二部件检测模型不同于所述第一部件检测模型,所述第二损伤检测模型不同于所述第一损伤检测模型。
在一个实施例中,融合所述N个M维向量和所述K个关键帧向量中的部件检测信息和损伤检测信息,得到各个部件的综合损伤特征,包括:确定至少一个备选受损部件,其中包括第一部件;对于所述N个M维向量和所述K个关键帧向量中的各个向量,通过对单个向量中的部件检测信息和损伤检测信息进行帧内融合,得到所述第一部件的帧综合特征,并通过将针对各个向量得到的所述第一部件的帧综合特征进行帧间融合,得到所述第一部件的综合损伤特征。
在一个实施例中,获取初步损伤结果包括,从移动端接收所述初步损伤识别结果。
在一个实施例中,所述特征矩阵包括M行S列,其中S不小于N,所述卷积神经网络包括若干一维卷积核,所述将所述特征矩阵输入预先训练的卷积神经网络,包括:利用所述若干一维卷积核在所述特征矩阵的行维度上,对所述特征矩阵进行卷积处理。
在一个实施例中,所述卷积神经网络通过以下方式训练:获取多个训练样本,其中每个训练样本包括每个视频流的样本特征矩阵和对应的损伤结果标签,所述每个视频 流的样本特征矩阵至少包括与所述每个视频流中N个图像帧分别对应、且按照该N个图像帧的时序排列的N个M维向量;使用所述多个训练样本,训练所述卷积神经网络。
在一个具体的实施例中,所述损伤结果标签包括以下中的至少一种:损伤材质、损伤类别、损伤部件的部件类别。
在一个实施例中,在得到针对所述视频流的最终定损结果之后,所述方法还包括:根据所述最终定损结果,确定对应的换修方案。
根据第二方面,提供一种计算机执行的车辆定损装置,该装置包括:第一获取单元,配置为获取视频流的特征矩阵,所述视频流针对损伤车辆而拍摄,所述特征矩阵至少包括与所述视频流中N个图像帧分别对应、且按照所述N个图像帧的时序排列的N个M维向量,每个M维向量至少包括,针对对应的图像帧,通过预先训练的第一部件检测模型得到的部件检测信息,以及通过预先训练的第一损伤检测模型得到的损伤检测信息;第二获取单元,配置为获取所述视频流中的K个关键帧;生成单元,配置为针对所述K个关键帧,生成对应的K个关键帧向量,每个关键帧向量包括,针对对应的关键帧图像,通过预先训练的第二部件检测模型得到的部件检测信息,以及通过预先训练的第二损伤检测模型得到的损伤检测信息;融合单元,配置为融合所述N个M维向量和所述K个关键帧向量中的部件检测信息和损伤检测信息,得到各个部件的综合损伤特征;第三获取单元,配置为获取初步损伤结果,所述初步损伤结果包括,将所述特征矩阵输入预先训练的卷积神经网络后得到的所述各个部件的损伤结果;定损单元,配置为将所述各个部件的综合损伤特征和所述初步损伤结果输入预先训练的决策树模型,得到针对所述视频流的最终定损结果。
根据第三方面,提供了一种计算机可读存储介质,其上存储有计算机程序,当所述计算机程序在计算机中执行时,令计算机执行第一方面的方法。
根据第四方面,提供了一种计算设备,包括存储器和处理器,所述存储器中存储有可执行代码,所述处理器执行所述可执行代码时,实现第一方面的方法。
根据本说明书实施例提供的方法和装置,基于对受损车辆进行拍摄产生的视频流进行智能定损。具体地,一方面,融合视频流的特征矩阵和关键帧的信息得到综合损伤特征,另一方面,将所述特征矩阵输入预先训练的卷积神经网络中得到初步损伤结果,再将综合损伤特征和初步损伤结果一起输入决策模型中,以得到最终的定损结果。通过以上方式,全面提升车辆智能定损的准确度。
附图说明
为了更清楚地说明本申请实施例的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。
图1为本说明书披露的一个实施例的典型实施场景示意图;
图2示出根据一个实施例的车辆定损方法的流程图;
图3a示出针对某个图像帧得到的部件检测信息的例子;
图3b示出针对某个图像帧得到的损伤检测信息的例子;
图4示出根据一个具体例子的对特征矩阵做卷积的示意图;
图5示出根据一个实施例的车辆定损装置的示意性框图。
具体实施方式
下面结合附图,对本说明书提供的方案进行描述。
车辆智能定损,主要涉及从普通用户拍摄的车损现场的图片中,自动识别出车辆的受损状况。为了实现对车损状况的识别,业界普遍采用的方法是,将用户拍摄的待识别的车损图片,与海量历史数据库进行比对得到相似的图片,基于相似图片的定损结果来决定待识别图片上的损伤部件及其程度。然而,这样的方式损伤识别准确率不够理想。
根据一种实施方式,通过监督训练的机器学习方式,训练图片的目标检测模型,采用这样的模型对车辆的部件目标和损伤目标分别进行检测,然后基于检测结果的综合分析,确定图片中车辆的车损状况。
更进一步地,根据本说明书的构思和实施框架,考虑到视频流比孤立的图片更能准确反映车辆的全面信息,提出一种基于视频流的智能定损方式。图1为本说明书披露的一个实施例的典型实施场景示意图。如图1所示,用户可以通过便携式移动终端,例如智能手机,对车损现场进行拍摄,生成视频流。移动终端上可以安装有定损识别相关的应用或工具,该应用或工具可以对视频流进行初步处理,对其中的N个图像帧进行轻量级的、初步的目标检测和特征提取,每个帧的目标检测结果和特征提取结果可以构成一个M维向量。于是,移动终端通过对视频流的初步处理可以生成一个特征矩阵,该 矩阵至少包括N个M维向量。移动终端中的应用还可以从视频流中确定出其中的关键帧,并且,利用CNN定损模型基于上述生成的特征矩阵进行初步定损,得到初步损伤结果。
接着,移动终端可以将上述特征矩阵、关键帧和初步损伤结果发送至服务端。
服务端一般具有更为强大和可靠的计算能力,因此,服务端可以利用更为复杂也更为准确的目标检测模型,对视频流中的关键帧再次进行目标检测,从中检测出车辆的部件信息和损伤信息。
然后,服务端将特征矩阵的信息和针对关键帧检测的信息进行融合,产生针对各个部件的综合损伤特征。此外需要理解的是,上述初步损伤结果中包括针对各个部件的定损结果。
在此基础上,可以将各个部件的综合损伤特征和上述初步损伤结果输入预先训练的决策树模型,得到针对视频流的最终定损结果,实现智能定损。
下面描述智能定损的具体实现过程。
图2示出根据一个实施例的车辆定损方法的流程图。该方法可以通过服务端执行,服务端可以体现为任何具有计算、处理能力的装置、设备、平台和设备集群。如图2所示,该方法至少包括以下步骤:
步骤21,获取视频流的特征矩阵,所述视频流针对损伤车辆而拍摄,所述特征矩阵至少包括与所述视频流中N个图像帧分别对应、且按照所述N个图像帧的时序排列的N个M维向量,每个M维向量至少包括,针对对应的图像帧,通过预先训练的第一部件检测模型得到的部件检测信息,以及通过预先训练的第一损伤检测模型得到的损伤检测信息;
步骤22,获取所述视频流中的K个关键帧;
步骤23,针对K个关键帧,生成对应的K个关键帧向量,每个关键帧向量包括,针对对应的关键帧图像,通过预先训练的第二部件检测模型得到的部件检测信息,以及通过预先训练的第二损伤检测模型得到的损伤检测信息;
步骤24,融合所述N个M维向量和所述K个关键帧向量中的部件检测信息和损伤检测信息,得到各个部件的综合损伤特征;
步骤25,获取初步损伤结果,所述初步损伤结果包括,将所述特征矩阵输入预先 训练的卷积神经网络后得到的所述各个部件的损伤结果;
步骤26,将所述各个部件的综合损伤特征和初步损伤结果输入预先训练的决策树模型,得到针对所述视频流的最终定损结果。
下面描述以上各个步骤的执行方式。
首先,在步骤21,获取视频流的特征矩阵。
可以理解,在车辆定损的场景中,视频流为用户利用移动终端中的图像采集设备,如摄像头,针对车损现场的损伤车辆进行拍摄而产生。如前所述,移动终端可以通过对应的应用或工具,对该视频流进行初步处理,生成上述特征矩阵。
具体而言,可以从视频流中抽取N个图像帧,对其进行初步处理。这里的N个图像帧可以包括视频流中的每一图像帧,也可以是按照预定时间间隔(如500ms)抽取的图像帧,还可以是按照其他抽取方式从视频流中获得的图像帧。
对于抽取出的每个图像帧,对该图像帧进行目标检测和特征提取,从而针对每个图像帧生成M维向量。
如本领域技术人员所知,目标检测用于从图片中识别出特定的目标对象,并对目标对象进行分类。通过利用标注有目标位置和目标类别的图片样本进行训练,可以得到各种目标检测模型。其中,部件检测模型和损伤检测模型是目标检测模型的具体化应用。当将车辆部件作为目标对象进行标注和训练,可以得到部件检测模型;当将车辆上的损伤对象作为目标对象进行标注和训练,可以得到损伤检测模型。
在该步骤中,为了对每个图像帧进行初步处理,可以采用预先训练的部件检测模型,此处称为第一部件检测模型,对图像帧进行部件检测,得到部件检测信息;并且采用预先训练的损伤检测模型,此处称为第一损伤检测模型,对图像帧进行损伤检测,得到损伤检测信息。
需要理解,本文中的“第一”、“第二”等用语,仅仅是为了区分同类事物,并不意在对其顺序等其他方面进行限定。
在本领域中,目标检测模型多是在卷积神经网络CNN的基础上,通过各种检测算法来实现。为了优化传统的卷积神经网络CNN的计算效率,已经提出多种轻量级的网络结构,例如包括SqueezeNet、MobileNet、ShuffleNet、Xception等等。这些轻量级的神经网络结构,通过采用不同的卷积计算方式,减少网络参数,从而简化传统CNN的 卷积计算,提高其计算效率。这样的轻量级的神经网络结构特别适合于在计算资源有限的移动终端中运行。
相应地,在一个实施例中,上述的第一部件检测模型和第一损伤检测模型,均采用以上轻量级的网络结构实现。
通过采用上述第一部件检测模型对图像帧进行部件检测,可以得到该图像帧中的部件检测信息。一般而言,部件检测信息可以包括,从对应图像帧中框选出车辆部件的部件检测框信息,以及针对该框选出的部件预测的部件类别。更具体地,部件检测框信息可以包括,部件检测框的位置,例如表示为(x,y,w,h)形式,以及从图像帧中提取的该部件检测框对应的图片卷积信息。
类似的,采用上述第一损伤检测模型对图像帧进行损伤检测,可以得到该图像帧中的损伤检测信息,该损伤检测信息可以包括,从对应图像帧中框选出损伤对象的损伤检测框信息,以及针对框选出的该处损伤进行预测的损伤类别。
图3a示出针对某个图像帧得到的部件检测信息的例子。可以看到,图3a中包括了若干部件检测框,每个部件检测框框选出一个部件。第一部件检测模型还可以针对每个部件检测框对应输出预测部件类别,例如各个矩形框左上角的数字即表示部件类别。例如,图3a中数字101代表右前门,102代表右后门,103代表门把手,等等。
图3b示出针对某个图像帧得到的损伤检测信息的例子。可以看到,图3b中包括了一系列矩形框,即为第一损伤检测模型输出的损伤检测框,每个损伤检测框框选出一处损伤。第一损伤检测模型还针对每个损伤检测框对应输出有预测损伤类别,例如各个矩形框左上角的数字即表示损伤类别。例如,图3b中数字12代表损伤类别为刮擦,还可能有其他数字代表其他损伤类别,例如用数字10代表变形,数字11代表撕裂,数字13代表(玻璃物件)的碎裂,等等。
在一个实施例中,上述第一部件检测模型还用于对检测的车辆部件进行图像分割,以得到对图像帧中各个部件的轮廓分割结果。
如本领域技术人员所知,图像分割是将图像分割或者划分为属于/不属于特定目标对象的区域,其输出可以表现为覆盖特定目标对象区域的蒙层(Mask)。在本领域中,已基于各种网络结构和各种分割算法提出了多种图像分割模型,例如基于CRF(条件随机场)的分割模型,Mask R-CNN模型等等。部件分割作为图像分割的一种具体应用,可以用于将车辆图片划分为属于/不属于特定部件的区域。部件分割可以采用任意的现有 分割算法来实现。
在一个实施例中,上述第一部件检测模型被训练为,既可以对部件进行识别(即位置预测和类别预测),又可以对部件进行分割。例如,可以采用基于Mask R-CNN的模型作为上述第一部件检测模型,该模型在基础卷积层后,通过两个网络分支,分别进行部件的识别和部件的分割。
在另一实施例中,上述第一部件检测模型包括用于部件识别的第一子模型,和用于部件分割的第二子模型。第一子模型输出部件识别结果,第二子模型输出部件分割结果。
在第一部件检测模型还用于进行部件分割的情况下,得到的部件检测信息还包括部件分割的结果。部件分割结果可以体现为各个部件的轮廓或覆盖区域。
如上,通过第一部件检测模型对某个图像帧进行部件检测,可以得到部件检测信息;通过第一损伤检测模型对该图像帧进行损伤检测,可以得到损伤检测信息。于是,可以基于部件检测信息和损伤检测信息,形成该图像帧对应的M维向量。
例如,在一个例子中,针对抽取的每个图像帧,形成60维向量,其中前30维的元素表示部件检测信息,后30维的元素表示损伤检测信息。
根据一种实施方式,除了对图像帧进行目标检测(包括部件检测和损伤检测),还对图像帧进行其他方面的特征分析和提取,将其包含在上述M维向量中。
在一个实施例中,对抽取的图像帧,获取其视频连续性特征,该特征可以反映图像帧之间的变化,进而反映出视频的稳定性和连续性,还可以用于对图像帧中的目标进行追踪。
在一个例子中,对于当前图像帧,可以获取该图像帧相对于前一图像帧的光流变化特征作为其连续性特征。光流变化可以采用一些现有的光流模型来计算获得。
在一个例子中,对于当前图像帧,可以获取该图像帧与前一图像帧的图像相似性(Strcture Similarity)作为连续性特征。在一个具体例子中,图像相似性可以通过SSIM(structural similarity index measurement)指数来衡量。具体的,当前图像帧与前一图像帧之间的SSIM指数,可以基于当前图像帧中各像素点的平均灰度值和灰度方差、前一图像帧中各像素点的平均灰度值和灰度方差,以及当前图像帧和前一图像帧中各像素点的协方差等计算得到。SSIM指数的最大值为1,SSIM指数越大,图像的结构相似度越高。
在一个例子中,对于当前图像帧,可以获取该图像帧中若干特征点相对于前一图像帧的偏移特征,基于该偏移特征确定连续性特征。具体而言,图像的特征点是图像中具有鲜明特性并能够有效反映图像本质特征、能够标识图像中目标物体的点(例如左前车灯的左上角)。特征点可以通过诸如SIFT(Scale-invariant feature transform,尺度不变特征变换)、LBP(Local Binary Pattern,局部二值模式)之类的方式确定。如此,可以根据特征点的偏移,来评估相邻两幅图像帧的变化。典型地,特征点的偏移可以通过投影矩阵(projective matrix)来描述。举例而言,假设当前图像帧的特征点集合为Y,前一图像帧的特征点集合为X,可以求解一个变换矩阵w,使得f(X)=Xw的结果尽可能接近Y,则求解出的变换矩阵w就可以作为前一图像帧到当前图像帧的投影矩阵。进一步地,可以将该投影矩阵作为当前图像帧的连续性特征。
可以理解,以上例子中的多种连续性特征,可以单独使用,也可以组合使用,在此不做限定。在更多实施方式中,还可以采用更多的方式确定图像帧之间的变化特征作为其连续性特征。
需要说明的是,在当前图像帧是视频流的第一个图像帧的情况下,确定其连续性特征时,可以将该图像帧本身作为它的前一图像帧进行对比,也可以直接将其连续性特征确定为预定值,例如投影矩阵的各个元素都为1,或者光流输出为0,等等。
根据一种实施方式,对于抽取的图像帧,还获取其图像帧质量特征,该特征可以反映图像帧的拍摄质量,也就是对于目标识别的有效性。一般而言,移动终端的应用中可以包含拍摄引导模型,用于引导用户的拍摄,例如距离的引导(更加靠近或者远离受损车辆),角度的引导等等。拍摄引导模型在进行拍摄引导中,会分析产生图像帧质量特征。图像帧质量特征可以包括:表示图像帧是否模糊的特征、表示图像帧是否包含目标的特征、表示图像帧光照是否充足的特征、表示图像帧拍摄角度是否预定角度的特征等等。这些特征中的一项或多项也可以包含在前述的针对每个图像帧的M维向量中。
例如,在一个具体例子中,针对抽取的每个图像帧,形成80维向量,其中1-10维的元素表示图像帧质量特征,11-20维的元素表示视频连续性特征,21-50维的元素表示部件检测信息,51-80维的元素表示损伤检测信息。
如此,通过对视频流中抽取的N个图像帧分别进行目标检测和特征提取,为每个图像帧生成M维向量,于是生成了N个M维向量。
在一个实施例中,将这N个M维向量按照N个图像帧的时序进行排列,从而得到 一个N*M维矩阵,作为视频流的特征矩阵。
在一个实施例中,还对N个图像帧对应的N个M维向量进行预处理,作为视频流的特征矩阵。该预处理可以包括归一化操作,从而将特征矩阵整理为固定的维度。
可以理解,特征矩阵的维度通常是预先设定的,而视频流中的图像帧往往是按照一定时间间隔抽取的,而视频流的长度一直在变化,且其总长度可能无法预先知晓。因此,将实际抽取的图像帧的M维向量直接进行组合并不总是能够符合特征矩阵的维度要求。在一个实施例中,假定特征矩阵的维度预先设定为S帧*M向量。如果从视频流中抽取的图像帧数目小于S,则可以通过补齐操作、插值操作、池化操作等方式,将抽取的图像帧对应的M维向量,整理为S*M维矩阵。在这样的情况下,S>N,N为特征矩阵中包含的实际抽取的图像帧的数目。在一个具体的实施例中,可以利用插值方式在N个M维向量中补充S-N个M维向量,以得到S行M列的特征矩阵。而如果从视频流中实际抽取的图像帧数目大于S,可以丢弃部分图像帧,最终使得特征矩阵满足预定的维度。在一个具体的实施例中,可以随机丢弃或者按照预定间隔丢弃一部分图像帧对应的M维向量,以生成S行M列的特征矩阵。在这样的情况下,S=N,N为特征矩阵中最后保留的图像帧的数目。
如上,通过对视频流中图像帧的初步处理,生成了视频流的特征矩阵,该特征矩阵至少包括与视频流中N个图像帧分别对应、且按照N个图像帧的时序排列的N个M维向量,每个M维向量至少包括,针对对应的图像帧,通过预先训练的第一部件检测模型得到的部件检测信息,以及通过预先训练的第一损伤检测模型得到的损伤检测信息。
可以理解,在以上实施例中,图像帧的初步处理以及特征矩阵的生成通过移动终端执行。在这样的情况下,就服务端而言,在步骤21,仅需要从移动终端接收该特征矩阵。这样的方式适合于移动终端中已安装有定损识别对应的应用或工具,且具备一定的计算能力的情况。由于特征矩阵的传输数据量远远小于视频流本身,因此,这样的方式非常有利于网络传输。
在另一实施例中,移动终端在拍摄视频流之后,将视频流传送到服务端,由服务端对各个图像帧进行处理,生成特征矩阵。在这样的情况下,就服务端而言,在步骤21,从移动终端获取拍摄的视频流,然后从中抽取图像帧,并对抽取的图像帧进行目标检测和特征提取,生成M维向量。具体地,可以针对各个图像帧,通过前述的第一部件检测模型进行部件检测,得到部件检测信息,并通过第一损伤检测模型进行损伤检测,得到损伤检测信息;并且至少基于所述部件检测信息和所述损伤检测信息,形成各个图像 帧对应的M维向量。然后,根据N个图像帧各自的M维向量,生成所述特征矩阵。以上过程与在移动终端中执行的过程相类似,不复赘述。
除了获取视频流的特征矩阵,在步骤22,服务端还获取所述视频流中的K个关键帧,其中K大于等于1,且小于前述的图像帧数目N。
在一个实施例中,由移动终端确定视频流中的K个关键帧,并将关键帧发送到服务端。因此,就服务端而言,在步骤22,仅需要从移动终端接收该K个关键帧。
或者,在另一实施例中,在步骤22,由服务端确定视频流中的关键帧。
不管是通过移动终端还是通过服务端,都可以采用多种已有的关键帧确定方式,确定出视频流中的关键帧。
例如,在一个实施例中,可以根据各个图像帧的质量特征,确定出综合质量较高的图像帧作为关键帧;在另一实施例中,可以根据各个图像帧的连续性特征,确定出相对于前一帧变化较大的图像帧作为关键帧。
取决于前述N个图像帧的抽取方式以及关键帧的确定方式,确定出的关键帧可以包含在N个图像帧中,也可以不同于前述的N个图像帧。
接着,在步骤23,对于步骤22获取的各个关键帧的图像,再次对其进行目标检测。具体地,可以通过预先训练的第二部件检测模型对该关键帧图像进行部件检测,得到部件检测信息,并通过预先训练的第二损伤检测模型对其进行损伤检测,得到损伤检测信息,基于该部件检测信息和损伤检测信息生成关键帧向量。
需要理解,此处的第二部件检测模型可以不同于为生成特征矩阵对图像帧进行初步处理所采用的第一部件检测模型。一般而言,第二部件检测模型是比第一部件检测模型更为准确更为复杂的模型,从而对视频流中的关键帧进行更为精准的部件检测。特别是,在特征矩阵由移动终端生成的情况下,第一部件检测模型通常是基于轻量级网络结构的模型,从而适合于移动终端有限的计算能力和资源;而第二部件检测模型可以是对计算能力有更高要求、适合于服务端的检测模型,从而对图像特征进行更为复杂的运算,得到更精确的结果。类似的,第二损伤检测模型也可以比前述的第一损伤检测模型更复杂更准确,从而对视频流中的关键帧进行更为精准的损伤检测。
在一个实施例中,第二部件检测模型还用于对部件进行图像分割。在这样的情况下,基于第二部件检测模型得到的部件检测信息中,还包括对关键帧图像中包含的部件进行图像分割的信息,即部件的轮廓信息。
如此,通过第二部件检测模型和第二损伤检测模型,对关键帧的图像再次进行目标检测,基于得到的部件检测信息和损伤检测信息,可以形成关键帧向量。针对K个关键帧,可以形成K个关键帧向量。
在一个具体例子中,关键帧向量为70维向量,其中1-35维的元素表示部件检测信息,36-70维的元素表示损伤检测信息。
接着,在步骤24,融合所述N个M维向量和所述K个关键帧向量中的部件检测信息和损伤检测信息,得到各个部件的综合损伤特征。
具体地,本步骤可以包括:确定至少一个备选受损部件,其中包括第一部件;对于所述N个M维向量和所述K个关键帧向量中的各个向量,通过对单个向量中的部件检测信息和损伤检测信息进行帧内融合,得到所述第一部件的帧综合特征,并通过将针对各个向量得到的所述第一部件的帧综合特征进行帧间融合,得到所述第一部件的综合损伤特征。
针对上述至少一个备选受损部件的确定,在一个实施例中,将车辆的各个部件均作为备选受损部件。例如,假设车辆预先被划分为100种部件,那么可以将这100种部件均作为备选受损部件。这样的方式不会出现遗漏,但是冗余计算较多,后续处理负担较大。
如前所述,在生成视频流的特征矩阵和关键帧向量时,均对图像帧进行了部件检测,得到的部件检测信息包含针对部件进行预测的部件类别。在一个实施例中,将部件检测信息中出现的部件类别作为备选受损部件。
更具体地,可以基于N个M维向量中的部件检测信息,确定第一备选部件集合。可以理解,每个M维向量的部件检测信息中可以包含若干部件检测框和对应的预测部件类别,可以将N个M维向量中预测部件类别的并集,作为第一备选部件集合。
类似的,可以基于K个关键帧向量中的部件检测信息,确定第二备选部件集合,即各个关键帧向量中预测部件类别的并集。然后,将上述第一备选部件集合和第二备选部件集合的并集中的部件,作为备选受损部件。换而言之,如果一个部件类别出现在视频流的特征矩阵中,或者出现在关键帧向量中,就意味着,在视频流的N个图像帧中检测到该类别的部件,或者在关键帧中检测到该部件,于是,可以将该类别的部件作为备选受损部件。
通过以上方式,可以得到多个备选受损部件。下面以其中任意的一个部件,(简 单起见称为第一部件)为例,描述后续处理方式。
对于任意的第一部件,例如图3a所示的右后门,在步骤24,对视频流的特征矩阵以及各个关键帧向量进行融合,得到该第一部件的综合损伤特征。为了得到综合损伤特征,融合操作可以包括帧内融合(或称第一融合)和帧间融合(或称第二融合)。在帧内融合中,将各个帧对应的向量中的部件检测信息和损伤检测信息进行融合,得到该帧中关于第一部件的帧综合特征;然后结合时序信息,通过帧间融合将各个帧对应的第一部件的帧综合特征进行融合,得到该第一部件的综合损伤特征。下面分别描述帧内融合和帧间融合的过程。
帧内融合旨在获取到某个帧中关于第一部件的部件级损伤特征,又称为帧综合特征。在一个实施例中,第一部件的帧综合特征可以包括,该帧中第一部件的部件特征,以及与该第一部件相关的损伤特征。例如,可以将第一部件的帧综合特征对应的向量和该第一部件相关的损伤特征对应的向量进行拼接,将对应得到的拼接向量作为第一部件的帧综合特征。
一方面,在一个实施例中,上述N个图像帧包括第一图像帧,对应于第一M维向量;上述得到所述第一部件的帧综合特征包括:从所述第一M维向量中的部件检测信息中提取与所述第一部件有关的第一检测信息;基于所述第一检测信息,确定所述第一图像帧中所述第一部件的部件特征,作为与所述第一图像帧对应的、所述第一部件的帧综合特征的一部分。
在一个具体的实施例中,上述N个图像帧还包括在所述第一图像帧之后的第二图像帧,对应于第二M维向量;所述每个M维向量还包括,视频连续性特征;上述得到所述第一部件的帧综合特征还包括:从所述第二M维向量中提取视频连续性特征;基于所述第一检测信息和所述视频连续性特征,确定该第一部件在所述第二图像帧中的第二检测信息;基于所述第二检测信息,确定所述第一部件在所述第二图像帧中的部件特征,作为与所述第二图像帧对应的、所述第一部件的帧综合特征的一部分。
更具体地,在一个例子中,上述视频连续性特征包括以下中的至少一项:图像帧之间的光流变化特征,图像帧之间的相似性特征,基于图像帧之间的投影矩阵确定的变换特征。
另一方面,在一个实施例中,上述N个图像帧包括第一图像帧,对应于第一M维向量;所述第一M维向量中的损伤检测信息包括,从所述第一图像帧中框选出多个损 伤对象的多个损伤检测框的信息;上述得到所述第一部件的帧综合特征包括:根据所述第一M维向量中的部件检测信息和所述多个损伤检测框的信息,确定同属于该第一部件的至少一个损伤检测框;获取所述至少一个损伤检测框的损伤特征;将所述至少一个损伤检测框的损伤特征进行第一融合操作,得到与第一部件相关的损伤特征,作为与所述第一图像帧对应的、所述第一部件的帧综合特征的一部分。
在一个具体的实施例中,所述第一M维向量中的部件检测信息包括,从所述第一图像帧中框选出多个部件的多个部件检测框的信息;所述确定同属于该第一部件的至少一个损伤检测框,包括:确定所述第一部件对应的第一部件检测框;根据所述多个损伤检测框与所述第一部件检测框的位置关系,确定同属于该第一部件的至少一个损伤检测框。
在另一个具体的实施例中,所述第一M维向量中的部件检测信息包括部件分割信息;所述确定同属于该第一部件的至少一个损伤检测框,包括:根据所述部件分割信息,确定所述第一部件覆盖的第一区域;根据所述多个损伤检测框的位置信息,确定其是否落入所述第一区域;将落入所述第一区域的损伤检测框确定为所述至少一个损伤检测框。
在一个具体的实施例中,所述至少一个损伤检测框包括第一损伤检测框;所述获取所述至少一个损伤检测框的损伤特征包括,获取所述第一损伤检测框对应的第一损伤特征,该第一损伤特征包括,与所述第一损伤检测框相关的图片卷积特征。
更具地,在一个例子中,所述损伤检测信息还包括,所述多个损伤检测框中各个损伤检测框对应的预测损伤类别;所述获取所述第一损伤检测框对应的第一损伤特征还包括,根据所述第一损伤检测框与所述多个损伤检测框中其他损伤检测框之间的关联关系,确定第一关联特征作为所述第一损伤特征的一部分,所述关联关系至少包括以下中的一项或多项:损伤检测框位置关联关系,预测损伤类别关联关系,以及通过所述图片卷积特征反映的框内容关联关系。
在一个具体的实施例中,所述第一融合操作包括以下中的一项或多项:取最大操作、取最小操作、求平均操作、求和操作、求中位数操作。
以上,针对第一图像帧可以分别获取到第一部件的部件特征和损伤特征。在此基础上,将以上获取的第一部件的部件特征和损伤特征拼接或组合在一起,就可以得到第一图像帧中第一部件的帧综合特征,也就是对第一图像帧关于第一部件进行了帧内融合。得到的第一部件的帧综合特征即为第一图像帧中第一部件的部件级损伤特征。
类似的,对于N个图像帧中每个图像帧,均可以基于其对应的M维向量进行上述帧内融合,得到该帧对应的第一部件的帧综合特征,记为第一向量。如此,对于N个图像帧可以得到N个第一向量,分别对应于,针对N个图像帧得到的第一部件的帧综合特征。
对于前述的K个关键帧中的各个关键帧,也可以基于其关键帧向量中的部件检测信息和损伤检测信息进行上述帧内融合,于是得到该关键帧的第一部件的帧综合特征,记为第二向量。如此,对于K个关键帧可以得到K个第二向量,分别对应于针对K个图像帧得到的第一部件的帧综合特征。
由于关键帧向量的维度与前述的M维向量可能有所不同,因此第一向量和第二向量的维度也有可能不同,不过其帧内融合的思路和过程是类似的。
接着,对于以上针对各个帧(包括N个图像帧和K个关键帧)获得的第一部件的帧综合特征进行帧间融合,从而得到第一部件的综合特征向量,作为所述综合损伤特征。
在一个实施例中,将N个第一向量进行第一组合,得到第一组合向量,其中N个第一向量分别对应于,针对所述N个图像帧得到的所述第一部件的帧综合特征;将K个第二向量进行第二组合,得到第二组合向量,其中所述K个第二向量对应于,针对所述K个关键帧得到的所述第一部件的帧综合特征;将所述第一组合向量和第二组合向量进行综合,得到所述综合特征向量。
在一个具体的实施例中,所述将N个第一向量进行第一组合包括:将所述N个第一向量按照对应的N个图像帧的时序进行拼接。
在另一个具体的实施例中,所述将N个第一向量进行第一组合包括:确定所述N个第一向量的权重因子;根据所述权重因子,对所述N个第一向量进行加权组合。
进一步地,在一个例子中,所述确定所述N个第一向量的权重因子包括:对于所述N个图像帧中各个图像帧,从所述K个关键帧中确定时序上最接近的关键帧;根据各个图像帧与其最接近的关键帧的时序距离,确定该图像帧对应的第一向量的权重因子,使得时序距离与所述权重因子负相关。
在另一个例子中,所述每个M维向量还包括,图像帧质量特征;所述确定所述N个第一向量的权重因子包括:对于所述N个图像帧中各个图像帧,根据该图像帧对应的M维向量中的图像帧质量特征,确定该图像帧对应的第一向量的权重因子。
在一个更具体的例子中,所述图像帧质量特征包括以下至少一项:表示图像帧是 否模糊的特征、表示图像帧是否包含目标的特征、表示图像帧光照是否充足的特征、表示图像帧拍摄角度是否预定角度的特征。
如此,第一部件的综合损伤特征是基于视频流的特征矩阵中的N个M维向量,以及K个关键帧向量进行融合而得到的,因此可以全面地反映,第一部件在视频流的N个图像帧,以及K个关键帧中的总体损伤特征。
在步骤25,获取初步损伤结果,所述初步损伤结果包括,将所述特征矩阵输入预先训练的卷积神经网络后得到的所述各个部件的损伤结果。
在一个实施例中,由移动终端确定初步损伤识别结果,并将初步损伤识别结果发送到服务端。因此,就服务端而言,在步骤25,仅需要从移动终端接收初步损伤结果。在另一个实施例中,在步骤25,由服务端确定针对视频流的初步损伤结果。
具体地,初步损伤结果不管是由移动终端确定,还是由服务端确定,均可以是基于预先训练的卷积神经网络而得到。
可以理解,卷积神经网络在处理图像时,其输入矩阵的格式往往是“批处理尺寸(batch_size)*长*宽*通道数”。其中,彩色图像的通道通常为“R”、“G”、“B”3个通道,即通道数为3。显然,该格式中,长和宽是相互独立的,通道之间则是相互影响的。同理,在对上述特征矩阵的二维卷积操作中,图像的不同空间位置的特征应该是独立的,二维卷积操作具有空间不变性。由于图像处理过程中一般是在“长*宽”维度上做卷积,而如果将“长*宽”替换为特征矩阵中的行数和列数,则在特征维度上,不同位置的特征之间是会相互影响的,而不是互相独立,对其进行卷积是不合理的。例如抽取细节损伤图,需要同时涉及细节图分类,损伤检测结果等多个维度的特征。也就是说,该空间不变性在时间维度成立,在特征维度上不成立。从而,这里的特征维度可以和图像处理中的通道维度的性质相对应。因此,可以对特征矩阵的输入格式进行调整,如调整为“批处理尺寸(batch_size)*1*列数(如S或N)*行数(M)”。这样,就可以在“1*列数(如S)”的维度做卷积,而每列是一个时刻的特征集合,通过对时间维度做卷积,可以挖掘出各个特征之间的关联。
在一个实施例中,卷积神经网络可以包括一个或多个卷积处理层和输出层。其中,卷积处理层可以由二维卷积层、激活层、标准化层组成,例如2D convolutional Filter+ReLU+Batch Normalization。
其中,二维卷积层可以用于通过对应于时间维度的卷积核对特征矩阵进行卷积处 理。在一个具体的实施例中,所述特征矩阵包括M行S列,其中S不小于N,所述卷积神经网络包括若干一维卷积核,所述将所述特征矩阵输入预先训练的卷积神经网络,包括:利用所述若干一维卷积核在所述特征矩阵的行维度上,对所述特征矩阵进行卷积处理。在一个例子中,针对图4示出M×S的特征矩阵,可以经过诸如(1,-1,-1,1)之类的卷积核对应于时间维度进行卷积操作。在卷积神经网络的训练过程中,可以针对性地训练卷积核。例如,可以针对每个特征训练一个卷积核。例如图4示出的卷积核(1,-1,-1,1)是对应于车损检测场景中的部件损伤特征的卷积核等等。如此,经过每一个卷积核的卷积操作,可以识别一个特征(例如车损检测场景中的部件损伤特征)。
激活层可以用于把二维卷积层的输出结果做非线性映射。激活层可以通过诸如Sigmoid、Tanh(双曲正切)、ReLU之类的激励函数实现。通过激活层,二维卷积层的输出结果映射为0-1之间的非线性变化数值。
随着网络加深,经过激活层后的输出结果可能会向梯度饱和区(对应激励函数梯度变化较小的区域)移动。这样,会由于梯度减小或消失导致的卷积神经网络收敛较慢或不收敛。因此,还可以进一步通过标准化层(Batch Normalization)将激活层的输出结果拉回激励函数的梯度变化明显的区域。
输出层用于输出针对视频流的初步损伤结果。
如此,可以实现利用卷积神经网络得到初步损伤结果。另一方面,对于卷机神经网络的训练,可以利用标注有损伤识别结果的视频流作为训练样本而进行。
在一个具体的实施例中,可以先获取多个训练样本,其中每个训练样本包括每个视频流的样本特征矩阵和对应的损伤结果标签,所述每个视频流的样本特征矩阵至少包括与所述每个视频流中N个图像帧分别对应、且按照该N个图像帧的时序排列的N个M维向量。需要理解的是,其中每个视频流的样本特征矩阵,可以基于步骤21中描述的方式得到。另一方面,其中的损伤结果标签包括以下中一种或多种:损伤材质、损伤类别、损伤部件的部件类别。
再利用多个训练样本,训练所述卷积神经网络。如本领域技术人员所知,基于各个样本对应的各个样本特征矩阵和损伤结果标签,可通过例如梯度下降法调整模型的参数。模型训练过程中的损失函数例如为上述多个样本的各自的预测值与标签值之差的平方和、或者为多个样本的各自的预测值与标签值之差的绝对值之和,等等。
需要说明的是,上述训练后的用于得到初步损伤结果的卷积神经网络,可以被进 一步改造而用于抽取关键帧,具体可以用于前述步骤22中,对K个关键帧的抽取。具体地,可以获取上述训练后的卷积神经网络,并且,固定此卷积神经网络中除输出层以外的其他层中的参数,再利用标注有关键帧的视频流作为训练样本进行进一步训练,调整输出层的参数,进而得到用于抽取关键帧的改造后的卷积神经网络。
接下来,在步骤26,将所述各个部件的综合损伤特征和所述初步损伤结果输入预先训练的决策树模型,得到针对所述视频流的最终定损结果。
在一个实施例中,针对上述至少一个备选受损部件中的某个部件,可以从步骤25中获取的初步损伤结果中提取该部件的损伤结果,并将此损伤结果对应的向量与该部件的综合损伤特征所对应的向量进行拼接,得到针对该部件的拼接向量。进一步地,在一个具体的实施例中,可以将该部件的拼接向量输入预先训练的决策树模型中,得到针对该部件的最终定损结果。由此,针对各个部件均进行此项操作,可以得到各个部件的最终定损结果,构成针对所述视频流的最终定损结果。
在一个实施例中,在本步骤中可以利用已有的多种具体的决策树算法,例如梯度提升决策树GBDT、分类决策树CRT,等等,作为上述决策树模型。
在一个具体的实施例中,通过利用二分类的标注样本进行预先训练,决策树模型体现为二分类模型。上述二分类的标注样本为,给定一段样本视频流,由标注人员标注出其中各个部件是否受损(受损为一个分类,未受损为另一分类)。预先训练的过程可以包括,通过前述的步骤21到25的方式,基于给定的样本视频流,针对某个部件得到其综合损伤特征和初步损伤结果,然后利用决策树模型,基于其综合损伤特征和初步损伤结果预测该部件是否受损。之后,将预测结果与该部件的标注结果进行比较,根据比较结果调整决策树模型的模型参数,使得预测结果更趋向于拟合标注结果。如此训练得到决策树二分类模型。
在这样的情况下,在步骤26,将待分析的某个部件的综合损伤特征和初步损伤结果输入到该决策树二分类模型后,模型会输出是否受损的二分类之一,于是可以得到该某个部件是否受损的结果。
在另一实施例中,通过利用多分类的标注样本进行预先训练,决策树模型体现为多分类模型。上述多分类的标注样本为,给定一段样本视频流,由标注人员标注出其中各个部件的损伤类别(例如,刮擦,变形,碎裂等多个损伤类别)。预先训练的过程可以包括,通过前述的步骤21到25的方式,基于给定的样本视频流,针对某个部件得到 其综合损伤特征和初步损伤结果,然后利用决策树模型,基于其综合损伤特征和初步损伤结果预测该部件的损伤类别。之后,将预测结果与该部件的标注结果进行比较,根据比较结果调整决策树模型的模型参数,使得预测结果更趋向于拟合标注结果。如此训练得到决策树多分类模型。
在这样的情况下,在步骤26,将待分析的某个部件的综合损伤特征向量输入到该决策树多分类模型后,模型会输出预测的损伤分类类别,于是可以根据该分类类别,得到该第一部件的损伤类型,作为最终定损结果。
可以理解,上述的某个部件为备选受损部件中的任意部件。对于各个备选受损部件,均可以执行以上过程,在步骤24得到其综合损伤特征,在步骤25得到其初步损伤结果,在步骤26,基于其综合损伤特征和初步损伤结果,得到其损伤状况。于是,可以得到各个备选受损部件的损伤状况,也就得到了整案的最终定损结果。
在一个具体例子中,采用多分类决策树模型对各个备选受损部件进行预测后,可以得到以下定损结果:右后门:刮擦;后保险杠:变形;尾灯:碎裂。
在一个实施例中,将这样的最终定损结果传回到移动终端。
在确定出包含各个部件损伤状况的最终定损结果的基础上,在一个实施例中,还可以据此确定出各个部件的换修方案。
可以理解,根据定损需要,工作人员可以预先设置有映射表,其中记录各种类型的部件在各种损伤类别下的换修方案。例如,对于金属类型的部件,损伤类别为刮擦时,对应的换修方案为喷漆,损伤类别为变形时,对应的换修方案为钣金;对于玻璃类型的部件,损伤类别为刮擦时,对应的换修方案为更换玻璃,等等。
如此,对于以上举例的第一部件“右后门”,假定确定其损伤类别为刮擦,那么首先,根据部件类别“右后门”确定其归属的类型,例如为金属类型部件,然后根据损伤类别“刮擦”,确定出对应的换修方案为:喷漆。
于是,可以针对各个受损部件,确定其换修方案,将定损结果和换修方案一并传回到移动终端,实现更全面的智能定损。
根据另一方面的实施例,提供了一种车辆定损的装置,该装置可以部署在服务端,服务端可以利用任何具有计算、处理能力的设备、平台或设备集群来实现。图5示出根据一个实施例的车辆定损装置的示意性框图。如图5所示,该装置500包括:
第一获取单元510,配置为获取视频流的特征矩阵,所述视频流针对损伤车辆而拍摄,所述特征矩阵至少包括与所述视频流中N个图像帧分别对应、且按照所述N个图像帧的时序排列的N个M维向量,每个M维向量至少包括,针对对应的图像帧,通过预先训练的第一部件检测模型得到的部件检测信息,以及通过预先训练的第一损伤检测模型得到的损伤检测信息。
第二获取单元520,配置为获取所述视频流中的K个关键帧。
生成单元530,配置为针对所述K个关键帧,生成对应的K个关键帧向量,每个关键帧向量包括,针对对应的关键帧图像,通过预先训练的第二部件检测模型得到的部件检测信息,以及通过预先训练的第二损伤检测模型得到的损伤检测信息。
融合单元540,配置为融合所述N个M维向量和所述K个关键帧向量中的部件检测信息和损伤检测信息,得到各个部件的综合损伤特征。
第三获取单元550,配置为获取初步损伤结果,所述初步损伤结果包括,将所述特征矩阵输入预先训练的卷积神经网络后得到的所述各个部件的损伤结果。
定损单元560,配置为将所述各个部件的综合损伤特征和所述初步损伤结果输入预先训练的决策树模型,得到针对所述视频流的最终定损结果。
在一个实施例中,所述第一获取单元510具体配置为,从移动终端接收所述特征矩阵。
在一个实施例中,所述第一获取单元510具体配置为:获取所述视频流;针对所述N个图像帧中的各个图像帧,通过所述第一部件检测模型进行部件检测,得到部件检测信息,并通过所述第一损伤检测模型进行损伤检测,得到损伤检测信息;至少基于所述部件检测信息和所述损伤检测信息,形成各个图像帧对应的M维向量;根据N个图像帧各自的M维向量,生成所述特征矩阵。
在一个实施例中,所述第二获取单元520具体配置为:从移动终端接收所述K个关键帧。
在一个实施例中,所述第二部件检测模型不同于所述第一部件检测模型,所述第二损伤检测模型不同于所述第一损伤检测模型。
在一个实施例中,所述融合单元540具体配置为:确定至少一个备选受损部件,其中包括第一部件;对于所述N个M维向量和所述K个关键帧向量中的各个向量,通过对单个向量中的部件检测信息和损伤检测信息进行帧内融合,得到所述第一部件的帧 综合特征,并通过将针对各个向量得到的所述第一部件的帧综合特征进行帧间融合,得到所述第一部件的综合损伤特征。
在一个实施例中,所述第三获取单元550具体配置为,从移动端接收所述初步损伤识别结果。
在一个实施例中,所述特征矩阵包括M行S列,其中S不小于N,所述卷积神经网络包括若干一维卷积核,所述将所述特征矩阵输入预先训练的卷积神经网络,包括:利用所述若干一维卷积核在所述特征矩阵的行维度上,对所述特征矩阵进行卷积处理。
在一个实施例中,所述卷积神经网络由训练单元进行预先训练得到,所述训练单元具体配置为:获取多个训练样本,其中每个训练样本包括每个视频流的样本特征矩阵和对应的损伤结果标签,所述每个视频流的样本特征矩阵至少包括与所述每个视频流中N个图像帧分别对应、且按照该N个图像帧的时序排列的N个M维向量;使用所述多个训练样本,训练所述卷积神经网络。
在一个实施例中,所述损伤结果标签包括以下中的至少一种:损伤材质、损伤类别、损伤部件的部件类别。
在一个实施例中,所述装置还包括:确定单元570,配置为根据所述最终定损结果,确定对应的换修方案。
通过以上的方法和装置,基于对受损车辆进行拍摄的视频流,进行智能定损。
根据另一方面的实施例,还提供一种计算机可读存储介质,其上存储有计算机程序,当所述计算机程序在计算机中执行时,令计算机执行结合图2所描述的方法。
根据再一方面的实施例,还提供一种计算设备,包括存储器和处理器,所述存储器中存储有可执行代码,所述处理器执行所述可执行代码时,实现结合图2所述的方法。
本领域技术人员应该可以意识到,在上述一个或多个示例中,本申请所描述的功能可以用硬件、软件、固件或它们的任意组合来实现。当使用软件实现时,可以将这些功能存储在计算机可读介质中或者作为计算机可读介质上的一个或多个指令或代码进行传输。
以上所述的具体实施方式,对本申请的目的、技术方案和有益效果进行了进一步详细说明,所应理解的是,以上所述仅为本申请的具体实施方式而已,并不用于限定本申请的保护范围,凡在本申请的技术方案的基础之上,所做的任何修改、等同替换、改进等,均应包括在本申请的保护范围之内。

Claims (24)

  1. 一种计算机执行的车辆定损方法,包括:
    获取视频流的特征矩阵,所述视频流针对损伤车辆而拍摄,所述特征矩阵至少包括与所述视频流中N个图像帧分别对应、且按照所述N个图像帧的时序排列的N个M维向量,每个M维向量至少包括,针对对应的图像帧,通过预先训练的第一部件检测模型得到的部件检测信息,以及通过预先训练的第一损伤检测模型得到的损伤检测信息;
    获取所述视频流中的K个关键帧;
    针对所述K个关键帧,生成对应的K个关键帧向量,每个关键帧向量包括,针对对应的关键帧图像,通过预先训练的第二部件检测模型得到的部件检测信息,以及通过预先训练的第二损伤检测模型得到的损伤检测信息;
    融合所述N个M维向量和所述K个关键帧向量中的部件检测信息和损伤检测信息,得到各个部件的综合损伤特征;
    获取初步损伤结果,所述初步损伤结果包括,将所述特征矩阵输入预先训练的卷积神经网络后得到的所述各个部件的损伤结果;
    将所述各个部件的综合损伤特征和所述初步损伤结果输入预先训练的决策树模型,得到针对所述视频流的最终定损结果。
  2. 根据权利要求1所述的方法,其中,获取视频流的特征矩阵包括,从移动终端接收所述特征矩阵。
  3. 根据权利要求1所述的方法,其中,获取视频流的特征矩阵包括:
    获取所述视频流;
    针对所述N个图像帧中的各个图像帧,通过所述第一部件检测模型进行部件检测,得到部件检测信息,并通过所述第一损伤检测模型进行损伤检测,得到损伤检测信息;
    至少基于所述部件检测信息和所述损伤检测信息,形成各个图像帧对应的M维向量;
    根据N个图像帧各自的M维向量,生成所述特征矩阵。
  4. 根据权利要求1所述的方法,其中,获取所述视频流中的K个关键帧包括:从移动终端接收所述K个关键帧。
  5. 根据权利要求1所述的方法,其中,所述第二部件检测模型不同于所述第一部件检测模型,所述第二损伤检测模型不同于所述第一损伤检测模型。
  6. 根据权利要求1所述的方法,其中,融合所述N个M维向量和所述K个关键帧向量中的部件检测信息和损伤检测信息,得到各个部件的综合损伤特征,包括:
    确定至少一个备选受损部件,其中包括第一部件;
    对于所述N个M维向量和所述K个关键帧向量中的各个向量,通过对单个向量中的部件检测信息和损伤检测信息进行帧内融合,得到所述第一部件的帧综合特征,并通过将针对各个向量得到的所述第一部件的帧综合特征进行帧间融合,得到所述第一部件的综合损伤特征。
  7. 根据权利要求1所述的方法,其中,所述获取初步损伤结果包括,从移动端接收所述初步损伤识别结果。
  8. 根据权利要求1所述的方法,其中,所述特征矩阵包括M行S列,其中S不小于N,所述卷积神经网络包括若干一维卷积核,所述将所述特征矩阵输入预先训练的卷积神经网络,包括:
    利用所述若干一维卷积核在所述特征矩阵的行维度上,对所述特征矩阵进行卷积处理。
  9. 根据权利要求1所述的方法,其中,所述卷积神经网络通过以下方式训练:
    获取多个训练样本,其中每个训练样本包括每个视频流的样本特征矩阵和对应的损伤结果标签,所述每个视频流的样本特征矩阵至少包括与所述每个视频流中N个图像帧分别对应、且按照该N个图像帧的时序排列的N个M维向量;
    使用所述多个训练样本,训练所述卷积神经网络。
  10. 根据权利要求9所述的方法,其中,所述损伤结果标签包括以下中的至少一种:损伤材质、损伤类别、损伤部件的部件类别。
  11. 根据权利要求1所述的方法,其中,在得到针对所述视频流的最终定损结果之后,所述方法还包括:
    根据所述最终定损结果,确定对应的换修方案。
  12. 一种计算机执行的车辆定损装置,包括:
    第一获取单元,配置为获取视频流的特征矩阵,所述视频流针对损伤车辆而拍摄,所述特征矩阵至少包括与所述视频流中N个图像帧分别对应、且按照所述N个图像帧的时序排列的N个M维向量,每个M维向量至少包括,针对对应的图像帧,通过预先训练的第一部件检测模型得到的部件检测信息,以及通过预先训练的第一损伤检测模型得到的损伤检测信息;
    第二获取单元,配置为获取所述视频流中的K个关键帧;
    生成单元,配置为针对所述K个关键帧,生成对应的K个关键帧向量,每个关键帧向量包括,针对对应的关键帧图像,通过预先训练的第二部件检测模型得到的部件检 测信息,以及通过预先训练的第二损伤检测模型得到的损伤检测信息;
    融合单元,配置为融合所述N个M维向量和所述K个关键帧向量中的部件检测信息和损伤检测信息,得到各个部件的综合损伤特征;
    第三获取单元,配置为获取初步损伤结果,所述初步损伤结果包括,将所述特征矩阵输入预先训练的卷积神经网络后得到的所述各个部件的损伤结果;
    定损单元,配置为将所述各个部件的综合损伤特征和所述初步损伤结果输入预先训练的决策树模型,得到针对所述视频流的最终定损结果。
  13. 根据权利要求12所述的装置,其中,所述第一获取单元被配置为,从移动终端接收所述特征矩阵。
  14. 根据权利要求12所述的装置,其中,所述第一获取单元被配置为:
    获取所述视频流;
    针对所述N个图像帧中的各个图像帧,通过所述第一部件检测模型进行部件检测,得到部件检测信息,并通过所述第一损伤检测模型进行损伤检测,得到损伤检测信息;
    至少基于所述部件检测信息和所述损伤检测信息,形成各个图像帧对应的M维向量;
    根据N个图像帧各自的M维向量,生成所述特征矩阵。
  15. 根据权利要求12所述的装置,其中,所述第二获取单元被配置为:从移动终端接收所述K个关键帧。
  16. 根据权利要求12所述的装置,其中,所述第二部件检测模型不同于所述第一部件检测模型,所述第二损伤检测模型不同于所述第一损伤检测模型。
  17. 根据权利要求12所述的装置,其中,所述融合单元被配置为:
    确定至少一个备选受损部件,其中包括第一部件;
    对于所述N个M维向量和所述K个关键帧向量中的各个向量,通过对单个向量中的部件检测信息和损伤检测信息进行帧内融合,得到所述第一部件的帧综合特征,并通过将针对各个向量得到的所述第一部件的帧综合特征进行帧间融合,得到所述第一部件的综合损伤特征。
  18. 根据权利要求12所述的装置,其中,所述第三获取单元被配置为,从移动端接收所述初步损伤识别结果。
  19. 根据权利要求12所述的装置,其中,所述特征矩阵包括M行S列,其中S不小于N,所述卷积神经网络包括若干一维卷积核,所述将所述特征矩阵输入预先训练的卷积神经网络,包括:
    利用所述若干一维卷积核在所述特征矩阵的行维度上,对所述特征矩阵进行卷积处理。
  20. 根据权利要求12所述的装置,其中,所述卷积神经网络由训练单元进行预先训练得到,所述训练单元被配置为:
    获取多个训练样本,其中每个训练样本包括每个视频流的样本特征矩阵和对应的损伤结果标签,所述每个视频流的样本特征矩阵至少包括与所述每个视频流中N个图像帧分别对应、且按照该N个图像帧的时序排列的N个M维向量;
    使用所述多个训练样本,训练所述卷积神经网络。
  21. 根据权利要求20所述的装置,其中,所述损伤结果标签包括以下中的至少一种:损伤材质、损伤类别、损伤部件的部件类别。
  22. 根据权利要求12所述的装置,其中,所述装置还包括:
    确定单元,配置为根据所述最终定损结果,确定对应的换修方案。
  23. 一种计算机可读存储介质,其上存储有计算机程序,其中,当所述计算机程序在计算机中执行时,令计算机执行权利要求1-11中任一项的所述的方法。
  24. 一种计算设备,包括存储器和处理器,其中,所述存储器中存储有可执行代码,所述处理器执行所述可执行代码时,实现权利要求1-11中任一项所述的方法。
PCT/CN2020/093890 2019-09-27 2020-06-02 计算机执行的车辆定损方法及装置 WO2021057069A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910923001.6 2019-09-27
CN201910923001.6A CN110647853A (zh) 2019-09-27 2019-09-27 计算机执行的车辆定损方法及装置

Publications (1)

Publication Number Publication Date
WO2021057069A1 true WO2021057069A1 (zh) 2021-04-01

Family

ID=68992913

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/093890 WO2021057069A1 (zh) 2019-09-27 2020-06-02 计算机执行的车辆定损方法及装置

Country Status (2)

Country Link
CN (1) CN110647853A (zh)
WO (1) WO2021057069A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113361426A (zh) * 2021-06-11 2021-09-07 爱保科技有限公司 车辆定损图像获取方法、介质、装置和电子设备
CN113553911A (zh) * 2021-06-25 2021-10-26 复旦大学 融合surf特征和卷积神经网络的小样本人脸表情识别方法

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110647853A (zh) * 2019-09-27 2020-01-03 支付宝(杭州)信息技术有限公司 计算机执行的车辆定损方法及装置
CN111627041B (zh) * 2020-04-15 2023-10-10 北京迈格威科技有限公司 多帧数据的处理方法、装置及电子设备
CN112712498B (zh) * 2020-12-25 2024-09-13 北京百度网讯科技有限公司 移动终端执行的车辆定损方法、装置、移动终端、介质
CN113361457A (zh) * 2021-06-29 2021-09-07 北京百度网讯科技有限公司 基于图像的车辆定损方法、装置及系统
CN117557221A (zh) * 2023-11-17 2024-02-13 德联易控科技(北京)有限公司 一种车辆损伤报告的生成方法、装置、设备和可读介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017176304A1 (en) * 2016-04-06 2017-10-12 American International Group, Inc. Automatic assessment of damage and repair costs in vehicles
CN108875648A (zh) * 2018-06-22 2018-11-23 深源恒际科技有限公司 一种基于手机视频流的实时车辆损伤和部件检测的方法
CN109784171A (zh) * 2018-12-14 2019-05-21 平安科技(深圳)有限公司 车辆定损图像筛选方法、装置、可读存储介质及服务器
CN110570318A (zh) * 2019-04-18 2019-12-13 阿里巴巴集团控股有限公司 计算机执行的基于视频流的车辆定损方法及装置
CN110647853A (zh) * 2019-09-27 2020-01-03 支付宝(杭州)信息技术有限公司 计算机执行的车辆定损方法及装置

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106021548A (zh) * 2016-05-27 2016-10-12 大连楼兰科技股份有限公司 基于分布式人工智能图像识别的远程定损方法及系统
US10270599B2 (en) * 2017-04-27 2019-04-23 Factom, Inc. Data reproducibility using blockchains

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017176304A1 (en) * 2016-04-06 2017-10-12 American International Group, Inc. Automatic assessment of damage and repair costs in vehicles
CN108875648A (zh) * 2018-06-22 2018-11-23 深源恒际科技有限公司 一种基于手机视频流的实时车辆损伤和部件检测的方法
CN109784171A (zh) * 2018-12-14 2019-05-21 平安科技(深圳)有限公司 车辆定损图像筛选方法、装置、可读存储介质及服务器
CN110570318A (zh) * 2019-04-18 2019-12-13 阿里巴巴集团控股有限公司 计算机执行的基于视频流的车辆定损方法及装置
CN110647853A (zh) * 2019-09-27 2020-01-03 支付宝(杭州)信息技术有限公司 计算机执行的车辆定损方法及装置

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113361426A (zh) * 2021-06-11 2021-09-07 爱保科技有限公司 车辆定损图像获取方法、介质、装置和电子设备
CN113553911A (zh) * 2021-06-25 2021-10-26 复旦大学 融合surf特征和卷积神经网络的小样本人脸表情识别方法

Also Published As

Publication number Publication date
CN110647853A (zh) 2020-01-03

Similar Documents

Publication Publication Date Title
WO2021057069A1 (zh) 计算机执行的车辆定损方法及装置
CN109584248B (zh) 基于特征融合和稠密连接网络的红外面目标实例分割方法
CN110569837B (zh) 优化损伤检测结果的方法及装置
CN110569702B (zh) 视频流的处理方法和装置
CN108171112B (zh) 基于卷积神经网络的车辆识别与跟踪方法
CN108918536B (zh) 轮胎模具表面字符缺陷检测方法、装置、设备及存储介质
CN108960245B (zh) 轮胎模具字符的检测与识别方法、装置、设备及存储介质
TWI726364B (zh) 電腦執行的車輛定損方法及裝置
CN110569700B (zh) 优化损伤识别结果的方法及装置
TW202011282A (zh) 用於車輛零件識別的神經網路系統、方法和裝置
CN110008909B (zh) 一种基于ai的实名制业务实时稽核系统
CN111160249A (zh) 基于跨尺度特征融合的光学遥感图像多类目标检测方法
CN104504365A (zh) 视频序列中的笑脸识别系统及方法
CN114663346A (zh) 一种基于改进YOLOv5网络的带钢表面缺陷检测方法
CN110298297A (zh) 火焰识别方法和装置
CN110298281B (zh) 视频结构化方法、装置、电子设备及存储介质
CN109948593A (zh) 基于结合全局密度特征的mcnn人群计数方法
CN111274964B (zh) 一种基于无人机视觉显著性分析水面污染物的检测方法
CN113313703A (zh) 基于深度学习图像识别的无人机输电线巡检方法
CN118037091A (zh) 一种基于计算机视觉技术的智能化报工质检方法及系统
CN114972316A (zh) 基于改进YOLOv5的电池壳端面缺陷实时检测方法
CN115375991A (zh) 一种强/弱光照和雾环境自适应目标检测方法
CN114332602A (zh) 一种智能货柜的商品识别方法
CN110570318B (zh) 计算机执行的基于视频流的车辆定损方法及装置
CN116993760A (zh) 一种基于图卷积和注意力机制的手势分割方法、系统、设备及介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20867495

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20867495

Country of ref document: EP

Kind code of ref document: A1