CN110570318A

CN110570318A - Vehicle loss assessment method and device executed by computer and based on video stream

Info

Publication number: CN110570318A
Application number: CN201910315062.4A
Authority: CN
Inventors: 蒋晨; 程远; 郭昕
Original assignee: Alibaba Group Holding Ltd
Current assignee: Advanced New Technologies Co Ltd; Advantageous New Technologies Co Ltd
Priority date: 2019-04-18
Filing date: 2019-04-18
Publication date: 2019-12-13
Anticipated expiration: 2039-04-18
Also published as: CN110570318B

Abstract

The embodiment of the specification provides a computer-implemented vehicle loss assessment method and device, and intelligent loss assessment is performed on the basis of a video stream generated by shooting a damaged vehicle. Specifically, firstly, preliminary target detection and feature extraction are performed on image frames in a video stream to obtain a video stream feature matrix. On the other hand, the target detection is carried out on the key frame in the video stream again to obtain a key frame vector. Then, respectively aiming at each component, fusing the characteristics in the video stream characteristic matrix and the key frame vector to generate a comprehensive characteristic vector of the component, and finally determining the damage condition of the component based on the comprehensive characteristic vector of the component.

Description

Vehicle loss assessment method and device executed by computer and based on video stream

Technical Field

One or more embodiments of the present disclosure relate to the field of machine learning, and more particularly, to a method and apparatus for intelligent damage assessment for vehicles using machine learning.

Background

in a traditional vehicle insurance claim settlement scene, an insurance company needs to send out professional loss survey and assessment personnel to an accident site for field loss survey and assessment, give a vehicle maintenance scheme and a compensation amount, take a field picture, and reserve the loss survey picture for a background checker to check loss and check price. Because of the need for manual investigation and loss assessment, insurance companies need to invest a great deal of labor cost, and training cost of professional knowledge. From the experience of common users, the claim settlement process is characterized in that the claim settlement process waits for manual prospecting personnel to take pictures on site, loss settlement personnel perform loss settlement in a maintenance place and loss verification personnel perform loss verification in the background, the claim settlement period is as long as 1-3 days, the waiting time of the users is long, and the experience is poor.

Aiming at the industry pain point with huge labor cost mentioned in the background of requirements, the application of artificial intelligence and machine learning to the scene of vehicle damage assessment is assumed, and the computer vision image recognition technology in the field of artificial intelligence is expected to be utilized to automatically recognize the vehicle damage condition reflected in the picture according to the field loss picture shot by a common user and automatically provide a maintenance scheme. Therefore, damage assessment and check are not required to be manually checked, the cost of insurance companies is greatly reduced, and the vehicle insurance claim settlement experience of common users is improved.

However, in the current intelligent damage assessment scheme, the accuracy of determining the vehicle damage needs to be further improved. Therefore, an improved scheme is desired, which can further optimize the detection result of the vehicle damage and improve the recognition accuracy.

disclosure of Invention

One or more embodiments of the present specification describe a method and an apparatus for intelligent damage assessment of a vehicle based on a video stream, wherein information of a feature matrix and a key frame of the video stream is integrated as a component-level damage feature, and a damage condition of a component is determined based on the component-level damage feature, thereby comprehensively improving the accuracy of the intelligent damage assessment.

According to a first aspect, there is provided a computer-implemented vehicle damage assessment method comprising:

Acquiring a feature matrix of a video stream, wherein the video stream is shot for a damaged vehicle, the feature matrix at least comprises N M-dimensional vectors which respectively correspond to N image frames in the video stream and are arranged according to a time sequence of the N image frames, and each M-dimensional vector at least comprises part detection information obtained by a pre-trained first part detection model and damage detection information obtained by the pre-trained first damage detection model for the corresponding image frame;

Acquiring K key frames in the video stream;

Generating corresponding K key frame vectors aiming at the K key frames, wherein each key frame vector comprises component detection information obtained by a pre-trained second component detection model aiming at a corresponding key frame image and damage detection information obtained by the pre-trained second damage detection model;

Determining at least one candidate damaged component, including a first component;

For each frame in the N image frames and the K key frames, performing intra-frame fusion on component detection information and damage detection information in a corresponding vector of the frame to obtain frame comprehensive characteristics of the first component, and performing inter-frame fusion on the frame comprehensive characteristics of the first component obtained for each frame to generate a comprehensive characteristic vector for the first component;

and inputting the comprehensive characteristic vector into a pre-trained decision tree model, and determining the damage condition of the first component according to the output of the decision tree model.

In one embodiment, obtaining the feature matrix for the video stream includes receiving the feature matrix from a mobile terminal.

In another embodiment, obtaining the feature matrix of the video stream comprises:

Acquiring the video stream;

For each image frame in the N image frames, carrying out component detection through the first component detection model to obtain component detection information, and carrying out damage detection through the first damage detection model to obtain damage detection information;

Forming an M-dimensional vector corresponding to each image frame based on at least the part detection information and the damage detection information,

And generating the feature matrix according to the respective M-dimensional vectors of the N image frames.

According to one implementation, obtaining K key frames in a video stream includes: and receiving the K key frames from the mobile terminal.

According to another implementation, obtaining K key frames in a video stream comprises: inputting the feature matrix into a key frame prediction model based on a Convolutional Neural Network (CNN), and determining the K key frames according to the output of the key frame prediction model.

according to one embodiment, the second component detection model is different from the first component detection model, and the second damage detection model is different from the first damage detection model.

In one embodiment, the candidate damaged component is determined by:

Determining a first set of candidate components based on component detection information in the N M-dimensional vectors;

Determining a second set of candidate components based on component detection information in the K keyframe vectors;

And taking the part which is the union of the first candidate part set and the second candidate part set as the at least one candidate damaged part.

In one embodiment, the N image frames comprise a first image frame corresponding to a first M-dimensional vector; for the first image frame, obtaining the frame integration characteristic of the first component may include:

extracting first detection information related to the first part from the part detection information in the first M-dimensional vector;

Determining a component feature of the first component in the first image frame as a part of a frame integration feature of the first component corresponding to the first image frame based on the first detection information.

further, the N image frames may further include a second image frame following the first image frame, corresponding to a second M-dimensional vector; each M-dimensional vector further comprises video continuity features;

For the second image frame, obtaining the frame integration characteristic of the first component may include:

Extracting video continuity features from the second M-dimensional vector;

determining second detection information of the first component in the second image frame based on the first detection information and the video continuity characteristic;

determining a component feature of the first component in the second image frame based on the second detection information as part of a frame integration feature of the first component corresponding to the second image frame.

further, the video continuity feature includes at least one of: an optical flow change feature between image frames, a similarity feature between image frames, a transformation feature determined based on a projection matrix between image frames.

In one embodiment, the damage detection information in the first M-dimensional vector corresponding to the first image frame comprises information of a plurality of damage detection frames for framing a plurality of damaged objects from the first image frame; for the first image frame, obtaining the frame integration characteristic of the first component may include:

Determining at least one damage detection frame belonging to the first component according to the component detection information in the first M-dimensional vector and the information of the plurality of damage detection frames;

Acquiring damage characteristics of the at least one damage detection frame;

and performing a first fusion operation on the damage features of the at least one damage detection frame to obtain a comprehensive damage feature, wherein the comprehensive damage feature is used as a part of the frame comprehensive feature of the first component corresponding to the first image frame.

further, in one embodiment, at least one damage detection box that is co-owned with the first component may be determined by:

Determining a first component detection frame corresponding to the first component;

and determining at least one damage detection frame belonging to the first component according to the position relation between the plurality of damage detection frames and the first component detection frame.

In another embodiment, the component detection information in the first M-dimensional vector further includes component segmentation information, in which case at least one damage detection box belonging to the first component may also be determined by:

Determining a first area covered by the first component according to the component segmentation information;

Determining whether the damage detection frames fall into the first area or not according to the position information of the damage detection frames;

Determining a damage detection frame falling into the first region as the at least one damage detection frame.

In one embodiment, the at least one impairment detection block comprises a first impairment detection block; correspondingly, the obtaining the damage characteristic of the at least one damage detection frame includes obtaining a first damage characteristic corresponding to the first damage detection frame, where the first damage characteristic includes a picture convolution characteristic related to the first damage detection frame.

further, in an embodiment, the damage detection information further includes a predicted damage category corresponding to each of the plurality of damage detection frames; correspondingly, the obtaining of the first damage characteristic corresponding to the first damage detection frame further includes determining, according to an association relationship between the first damage detection frame and another damage detection frame in the plurality of damage detection frames, a first association characteristic as a part of the first damage characteristic, where the association relationship at least includes one or more of the following: and predicting the position incidence relation of the damage detection frame, predicting the incidence relation of the damage category and reflecting the frame content incidence relation through the picture convolution characteristics.

In one embodiment, the first fusing operation on the lesion features of the at least one lesion detection box comprises one or more of: and taking the maximum operation, taking the minimum operation, calculating the average operation, summing the operation and calculating the median operation.

according to one embodiment, an integrated feature vector is generated for the first component by inter-frame fusion by:

Performing first combination on N first vectors to obtain first combined vectors, wherein the N first vectors respectively correspond to frame synthesis features of the first component obtained for the N image frames;

performing second combination on the K second vectors to obtain a second combined vector, wherein the K second vectors correspond to the frame comprehensive characteristics of the first component obtained aiming at the K key frames;

And synthesizing the first combination vector and the second combination vector to obtain the synthesized feature vector.

in one embodiment, first combining the N first vectors comprises: and splicing the N first vectors according to the time sequence of the corresponding N image frames.

In another embodiment, first combining the N first vectors comprises: determining weight factors of the N first vectors; and carrying out weighted combination on the N first vectors according to the weight factors.

further, the weighting factors for the N first vectors may be determined as follows:

For each image frame in the N image frames, determining a temporally closest key frame from the K key frames;

And determining the weight factor of the first vector corresponding to each image frame according to the time sequence distance of the image frame and the nearest key frame thereof, so that the time sequence distance is in negative correlation with the weight factor.

in another embodiment, each M-dimensional vector further includes, an image frame quality feature; in such a case, the weighting factors for the N first vectors may also be determined as follows: and for each image frame in the N image frames, determining the weight factor of a first vector corresponding to the image frame according to the image frame quality characteristic in the M-dimensional vector corresponding to the image frame.

Further, the image frame quality features include at least one of: the image processing device comprises a feature indicating whether an image frame is blurred or not, a feature indicating whether the image frame contains a target or not, a feature indicating whether the image frame is sufficiently illuminated or not, and a feature indicating whether the image frame is shot at a predetermined angle or not.

According to one embodiment, the decision tree model is a binary model; the determining the damage condition of the first component according to the output of the decision tree model comprises: and determining whether the first part is damaged according to one-half of the output class of the decision tree model.

According to another embodiment, the decision tree model is a multi-classification model; the determining the damage condition of the first component according to the output of the decision tree model comprises: and determining the damage category of the first component according to the classification category output by the decision tree model.

According to one embodiment, the method further comprises determining a repair plan for the first component based on the component category of the first component and the damage condition of the first component.

according to a second aspect, there is provided a vehicle damage assessment apparatus comprising:

a feature matrix obtaining unit configured to obtain a feature matrix of a video stream, the video stream being captured for a damaged vehicle, the feature matrix including at least N M-dimensional vectors that correspond to N image frames in the video stream, respectively, and are arranged according to a time sequence of the N image frames, each M-dimensional vector including at least, for the corresponding image frame, component detection information obtained by a pre-trained first component detection model, and damage detection information obtained by the pre-trained first damage detection model;

A key frame acquisition unit configured to acquire K key frames in the video stream;

A generating unit configured to generate, for the K keyframes, corresponding K keyframe vectors, each keyframe vector including, for a corresponding keyframe image, component detection information obtained by a pre-trained second component detection model, and damage detection information obtained by a pre-trained second damage detection model;

An alternative component determination unit configured to determine at least one alternative damaged component, including a first component;

A fusion unit configured to perform intra-frame fusion on the component detection information and the damage detection information in the vector corresponding to each frame to obtain a frame comprehensive feature of the first component, and perform inter-frame fusion on the frame comprehensive feature of the first component obtained for each frame to generate a comprehensive feature vector for the first component;

And the damage determining unit is configured to input the comprehensive characteristic vector into a pre-trained decision tree model and determine the damage condition of the first component according to the output of the decision tree model.

According to a third aspect, there is provided a computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of the first aspect.

According to a fourth aspect, there is provided a computing device comprising a memory and a processor, wherein the memory has stored therein executable code, and wherein the processor, when executing the executable code, implements the method of the first aspect.

according to the method and the device provided by the embodiment of the specification, intelligent damage assessment is carried out on the basis of a video stream generated by shooting a damaged vehicle. Specifically, firstly, preliminary target detection and feature extraction are performed on image frames in a video stream to obtain a video stream feature matrix. On the other hand, the target detection is carried out on the key frame in the video stream again to obtain a key frame vector. Then, respectively aiming at each component, fusing the characteristics in the video stream characteristic matrix and the key frame vector to generate a comprehensive characteristic vector of the component, and finally determining the damage condition of the component based on the comprehensive characteristic vector of the component. Through above mode, promote the degree of accuracy that vehicle intelligence was decided and is decreased.

drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a diagram illustrating an exemplary implementation scenario of one embodiment disclosed herein;

FIG. 2 illustrates a flow diagram of a vehicle damage assessment method according to one embodiment;

fig. 3a shows an example of component detection information obtained for a certain image frame;

Fig. 3b shows an example of damage detection information obtained for a certain image frame;

FIG. 4 illustrates a flow of steps to obtain a composite damage signature for a first component, according to one embodiment;

FIG. 5a shows a schematic view of a weight factor distribution of an image frame according to an embodiment;

FIG. 5b shows a schematic view of a weight factor distribution of an image frame according to another embodiment;

fig. 6 shows a schematic block diagram of a vehicle damage assessment device according to one embodiment.

Detailed Description

The scheme provided by the specification is described below with reference to the accompanying drawings.

The intelligent damage assessment system for vehicles mainly relates to the automatic identification of damaged conditions of vehicles from pictures of vehicle damage sites shot by ordinary users. In order to identify the vehicle damage condition, a method generally adopted in the industry is to compare a vehicle damage picture to be identified, which is shot by a user, with a massive historical database to obtain a similar picture, and determine a damaged component and the degree thereof on the picture to be identified based on a damage assessment result of the similar picture. However, such approaches compromise recognition accuracy.

According to one embodiment, a machine learning mode of supervised training is adopted to train an object detection model of a picture, the part object and the damage object of the vehicle are respectively detected by adopting the model, and then the vehicle damage condition of the vehicle in the picture is determined based on comprehensive analysis of detection results.

furthermore, according to the conception and the implementation framework of the specification, considering that the video stream can more accurately reflect the comprehensive information of the vehicle than the isolated picture, an intelligent damage assessment method based on the video stream is provided. Fig. 1 is a schematic diagram of an exemplary implementation scenario of an embodiment disclosed in this specification. As shown in fig. 1, a user may photograph a car damage scene through a portable mobile terminal, such as a smart phone, to generate a video stream. The mobile terminal may be installed with an application or tool related to impairment recognition (for example, referred to as "impairment treasure"), the application or tool may perform preliminary processing on the video stream, perform lightweight and preliminary target detection and feature extraction on N image frames, and the target detection result and the feature extraction result of each frame may form an M-dimensional vector. Then, the mobile terminal may generate a feature matrix by performing a preliminary processing on the video stream, where the feature matrix includes at least N M-dimensional vectors. The application in the mobile terminal may also determine key frames from the video stream.

then, the mobile terminal may send the feature matrix and the key frame to the server.

The server generally has more powerful and reliable computing capability, so that the server can perform target detection again on the key frames in the video stream by using a more complex and more accurate target detection model to detect the component information and the damage information of the vehicle.

Then, the server side fuses the information of the feature matrix and the information detected aiming at the key frame to generate component comprehensive features aiming at each component. Specifically, intra-frame fusion may be performed on each frame first, and based on a target detection result of the frame, a frame comprehensive feature of a certain specific component may be obtained. Then, inter-frame fusion is carried out on the frame comprehensive characteristics of each frame of the specific part, and the part comprehensive characteristics of the part in the whole video stream are obtained.

on the basis, the comprehensive characteristics of the components of a certain specific component can be input into a pre-trained decision tree model, and the damage condition of the specific component can be determined according to the output of the decision tree model. By judging each spare part in such a way, the damage condition of each part can be obtained, so that the damage condition of the whole vehicle is obtained, and intelligent damage assessment is realized.

the following describes a specific implementation process of the intelligent impairment assessment.

FIG. 2 shows a flow diagram of a vehicle damage assessment method according to one embodiment. The method can be executed by a server, and the server can be embodied as any device, equipment, platform and equipment cluster with computing and processing capabilities. As shown in fig. 2, the method comprises at least the following steps:

Step 21, obtaining a feature matrix of a video stream, wherein the video stream is shot for a damaged vehicle, the feature matrix at least comprises N M-dimensional vectors which respectively correspond to N image frames in the video stream and are arranged according to a time sequence of the N image frames, and each M-dimensional vector at least comprises component detection information obtained by a pre-trained first component detection model and damage detection information obtained by the pre-trained first damage detection model for the corresponding image frame;

Step 22, acquiring K key frames in the video stream;

Step 23, generating K corresponding key frame vectors for the K key frames, where each key frame vector includes, for the corresponding key frame image, component detection information obtained by the pre-trained second component detection model, and damage detection information obtained by the pre-trained second damage detection model;

Step 24, determining at least one alternative damaged component, including a first component;

Step 25, for each frame of the N image frames and the K key frames, performing intra-frame fusion on the component detection information and the damage detection information in the vector corresponding to the frame to obtain a frame comprehensive feature of the first component, and performing inter-frame fusion on the frame comprehensive feature of the first component obtained for each frame to generate a comprehensive feature vector for the first component;

and 26, inputting the comprehensive characteristic vector into a pre-trained decision tree model, and determining the damage condition of the first component according to the output of the decision tree model.

the manner in which the above steps are performed is described below.

first, in step 21, a feature matrix of a video stream is acquired.

it can be understood that in the scene of vehicle damage assessment, the video stream is generated by shooting the damaged vehicle on the vehicle damage site by the user through an image acquisition device in the mobile terminal, such as a camera. As mentioned above, the mobile terminal may perform preliminary processing on the video stream through a corresponding application or tool to generate the feature matrix.

Specifically, N image frames may be extracted from the video stream and subjected to preliminary processing. The N image frames may include each image frame in the video stream, and may also be image frames extracted at a predetermined time interval (e.g., 500ms), or image frames obtained from the video stream in other extraction manners.

for each extracted image frame, object detection and feature extraction are performed on the image frame, thereby generating an M-dimensional vector for each image frame.

As known to those skilled in the art, target detection is used to identify a specific target object from a picture and classify the target object. By training with the image samples marked with the target positions and the target types, various target detection models can be obtained. The component detection model and the damage detection model are specific applications of the target detection model. When a vehicle component is used as a target object for marking and training, a component detection model can be obtained; when the damage object on the vehicle is used as a target object for marking and training, a damage detection model can be obtained.

in this step, in order to perform preliminary processing on each image frame, a pre-trained component detection model, referred to as a first component detection model herein, may be used to perform component detection on the image frame to obtain component detection information; and a pre-trained damage detection model, referred to herein as a first damage detection model, is used to perform damage detection on the image frames to obtain damage detection information.

it is to be understood that the terms "first," "second," and the like, herein are used for clarity and simplicity of description, and are used for distinguishing and describing some concepts, and are not intended to limit the order of the description or the like.

in the art, the target detection model is mostly realized by various detection algorithms based on the convolutional neural network CNN. In order to optimize the computational efficiency of the conventional convolutional neural network CNN, various lightweight network structures have been proposed, including, for example, squeezet, MobileNet, ShuffleNet, Xception, and so on. The lightweight neural network structures reduce network parameters by adopting different convolution calculation modes, thereby simplifying the convolution calculation of the traditional CNN and improving the calculation efficiency. Such a lightweight neural network architecture is particularly suitable for operation in mobile terminals with limited computational resources.

Accordingly, in one embodiment, the first component detection model and the first damage detection model are implemented by using the above lightweight network structure.

By performing component detection on the image frame by using the first component detection model, component detection information in the image frame can be obtained. In general, the component detection information may include component detection frame information for frame-selecting a vehicle component from a corresponding image frame, and a component category predicted for the frame-selected component. More specifically, the part detection frame information may include a position of the part detection frame, for example, expressed in (x, y, w, h) form, and picture convolution information corresponding to the part detection frame extracted from the image frame.

similarly, when the image frame is subjected to damage detection by using the first damage detection model, damage detection information in the image frame may be obtained, and the damage detection information may include damage detection frame information for selecting a damage object from a corresponding image frame, and a damage type for predicting the damage selected by the frame.

fig. 3a shows an example of component detection information obtained for a certain image frame. It can be seen that FIG. 3a includes several part detection boxes, each of which selects a part. The first component detection model may also output a predicted component category corresponding to each component detection box, for example, a number in the upper left corner of each rectangular box represents a component category. For example, numeral 101 in fig. 3a represents a front right door, 102 represents a rear right door, 103 represents a door handle, and so on.

fig. 3b shows an example of damage detection information obtained for a certain image frame. It can be seen that fig. 3b includes a series of rectangular frames, i.e., the damage detection frames output by the first damage detection model, and each damage detection frame selects one damage. The first damage detection model also outputs a predicted damage category corresponding to each damage detection frame, for example, the numbers at the upper left corner of each rectangular frame represent the damage categories. For example, in fig. 3b the numeral 12 represents the damage category as scratch, and there may be other numerals representing other damage categories, such as deformation with numeral 10, tearing with numeral 11, chipping with numeral 13 (glass article), etc.

In one embodiment, the first component detection model is further used for performing image segmentation on the detected vehicle components to obtain the contour segmentation result for each component in the image frame.

as known to those skilled in the art, image segmentation is the segmentation or division of an image into regions that belong to/do not belong to a specific target object, the output of which may appear as a Mask (Mask) covering the specific target object region. In the art, various image segmentation models have been proposed based on various network structures and various segmentation algorithms, such as a CRF (conditional random field) -based segmentation model, a Mask R-CNN model, and so on. Component segmentation as a specific application of image segmentation may be used to divide a picture of a vehicle into regions that belong/do not belong to a particular component. The component segmentation may be implemented using any of the existing segmentation algorithms.

In one embodiment, the first component detection model is trained to both identify components (i.e., location prediction and class prediction) and segment components. For example, a Mask R-CNN-based model may be used as the first component detection model, and the model identifies components and divides components through two network branches after the base convolution layer.

in another embodiment, the first part detection model comprises a first submodel for part identification and a second submodel for part segmentation. The first submodel outputs the result of the recognition by the means, and the second submodel outputs the result of the division by the means.

In the case where the first part detection model is also used for part segmentation, the obtained part detection information further includes a result of the part segmentation. The part segmentation result may be embodied as an outline or coverage area of each part.

as above, component detection information can be obtained by performing component detection on a certain image frame by the first component detection model; and carrying out damage detection on the image frame through the first damage detection model to obtain damage detection information. Then, an M-dimensional vector corresponding to the image frame may be formed based on the component detection information and the damage detection information.

For example, in one example, for each image frame extracted, a 60-dimensional vector is formed, in which elements of the first 30 dimensions represent part detection information and elements of the second 30 dimensions represent damage detection information.

According to one embodiment, in addition to object detection (including component detection and damage detection) of the image frames, other aspects of feature analysis and extraction are performed on the image frames, which are included in the M-dimensional vectors described above.

In one embodiment, for the extracted image frames, video continuity features of the extracted image frames are obtained, and the features can reflect changes among the image frames, further reflect the stability and continuity of the video, and can also be used for tracking targets in the image frames.

In one example, for a current image frame, optical flow variation characteristics of the image frame relative to a previous image frame may be acquired as continuity characteristics thereof. The optical flow variation can be calculated by using some existing optical flow models.

In one example, for a current image frame, image Similarity (feature Similarity) of the image frame and a previous image frame may be obtained as a continuity feature. In one specific example, image similarity may be measured by an ssim (structural similarity index measure) index. Specifically, the SSIM index between the current image frame and the previous image frame may be calculated based on an average gray value and a gray variance of each pixel in the current image frame, an average gray value and a gray variance of each pixel in the previous image frame, and a covariance of each pixel in the current image frame and the previous image frame. The maximum value of the SSIM index is 1, and the larger the SSIM index is, the higher the structural similarity of the image is.

in one example, for a current image frame, offset features of several feature points in the image frame relative to a previous image frame may be obtained, and continuity features are determined based on the offset features. Specifically, the feature point of the image is a point (for example, the upper left corner of the front left headlight) in the image that has a clear characteristic and can effectively reflect the essential features of the image and identify a target object in the image. The feature points may be determined by a method such as SIFT (Scale-invariant feature transform), LBP (Local Binary Pattern), and the like. Thus, the change of two adjacent image frames can be evaluated according to the deviation of the characteristic point. Typically, the shift of feature points can be described by a projection matrix (projected matrix). For example, assuming that the feature point set of the current image frame is Y and the feature point set of the previous image frame is X, a transformation matrix w may be solved, so that the result of f (X) Xw is as close to Y as possible, and the solved transformation matrix w may be used as the projection matrix from the previous image frame to the current image frame. Further, the projection matrix may be used as a continuity feature of the current image frame.

It is understood that the various continuity features in the above examples may be used alone or in combination, and are not limited herein. In further embodiments, the change characteristics between image frames may also be determined as their continuity characteristics in further ways.

It should be noted that, when determining the continuity characteristic when the current image frame is the first image frame of the video stream, the image frame itself may be compared as the previous image frame, or the continuity characteristic may be directly determined as a predetermined value, for example, each element of the projection matrix is 1, or the optical flow output is 0, and so on.

According to one embodiment, for the extracted image frames, image frame quality characteristics are also obtained, which may reflect the shooting quality of the image frames, i.e. the effectiveness for object recognition. Generally, the application of the mobile terminal may include a shooting guidance model for guiding the shooting of the user, such as guidance of distance (closer to or farther away from a damaged vehicle), guidance of angle, and the like. The shooting guide model analyzes and generates image frame quality characteristics in shooting guide. The image frame quality features may include: a feature indicating whether the image frame is blurred, a feature indicating whether the image frame contains an object, a feature indicating whether the image frame is sufficiently illuminated, a feature indicating whether the image frame is photographed at a predetermined angle, and the like. One or more of these features may also be included in the aforementioned M-dimensional vector for each image frame.

for example, in a specific example, for each extracted image frame, an 80-dimensional vector is formed, in which elements of 1-10 dimensions represent image frame quality features, elements of 11-20 dimensions represent video continuity features, elements of 21-50 dimensions represent part detection information, and elements of 51-80 dimensions represent damage detection information.

in this way, by performing object detection and feature extraction on N image frames extracted from the video stream, respectively, an M-dimensional vector is generated for each image frame, and thus N M-dimensional vectors are generated.

in one embodiment, the N M-dimensional vectors are arranged according to a time sequence of N image frames, thereby obtaining an N x M-dimensional matrix as a feature matrix of the video stream.

In one embodiment, N M-dimensional vectors corresponding to N image frames are also preprocessed as feature matrices of the video stream. The preprocessing may include a normalization operation to sort the feature matrix into fixed dimensions.

It is understood that the dimension of the feature matrix is usually predetermined, and the N image frames in the video stream are often extracted at certain time intervals, while the length of the video stream is always changing and the total length thereof may not be known in advance. Therefore, directly combining the M-dimensional vectors of the actually extracted image frames does not always meet the dimensional requirements of the feature matrix. In one embodiment, assuming that the dimension of the feature matrix is preset to be S frames by M vectors, if the number N of image frames extracted from the video stream is less than S, N M-dimensional vectors corresponding to N image frames may be sorted into an S by M matrix by a padding operation, an interpolation operation, a pooling operation, or the like. In one example, in the case where the number of decimated image frames is greater than S, a portion of the image frames may be discarded, eventually causing the feature matrix to satisfy a predetermined dimension.

as described above, through the preliminary processing of the image frames in the video stream, a feature matrix of the video stream is generated, the feature matrix including at least N M-dimensional vectors that correspond to the N image frames in the video stream, respectively, and are arranged in time series of the N image frames, each of the M-dimensional vectors including at least, for the corresponding image frame, component detection information obtained by the pre-trained first component detection model, and damage detection information obtained by the pre-trained first damage detection model.

it is to be understood that in the above embodiment, the preliminary processing of the image frames and the generation of the feature matrix are performed by the mobile terminal. In such a case, the server only needs to receive the feature matrix from the mobile terminal in step 21. Such a method is suitable for a case where the mobile terminal has an application or a tool corresponding to the damage assessment identification and has a certain calculation capability. This approach is very advantageous for network transmission, since the amount of transmitted data of the feature matrix is much smaller than the video stream itself.

In another embodiment, after shooting the video stream, the mobile terminal transmits the video stream to the server, and the server processes each image frame to generate the feature matrix. In this case, as for the server, in step 21, a photographed video stream is acquired from the mobile terminal, and then image frames are extracted therefrom, and object detection and feature extraction are performed on the extracted image frames, generating M-dimensional vectors. Specifically, for each image frame, component detection may be performed through the first component detection model to obtain component detection information, and damage detection may be performed through the first damage detection model to obtain damage detection information; and forming an M-dimensional vector corresponding to each image frame based on at least the part detection information and the damage detection information. Then, the feature matrix is generated from the M-dimensional vectors of the N image frames, respectively. The above process is similar to the process executed in the mobile terminal, and is not repeated.

in addition to obtaining the feature matrix of the video stream, in step 22, the server obtains K key frames in the video stream, where K is greater than or equal to 1 and less than the number N of the image frames.

in one embodiment, K key frames in a video stream are determined by a mobile terminal and sent to a server. Thus, as far as the server is concerned, only the K key frames need to be received from the mobile terminal in step 22.

Alternatively, in another embodiment, at step 22, the server determines the key frames in the video stream.

Whether through the mobile terminal or the server, the key frames in the video stream can be determined by adopting various existing key frame determination modes.

For example, in one embodiment, the image frames with higher overall quality may be determined as the key frames according to the quality characteristics of the image frames; in another embodiment, an image frame with a larger change relative to the previous frame may be determined as the key frame according to the continuity characteristics of each image frame.

In one embodiment, the feature matrix generated in step 21 may be further input to a key frame prediction model based on the convolutional neural network CNN, and a key frame is determined according to an output of the model. This is to take into account that the convolutional neural network CNN is adapted to receive a two-dimensional matrix as characteristic input. Such a key frame prediction model may be pre-trained based on a video stream that labels relevant key frames as training samples. After the training is completed, the model can predict the key frames in the video stream based on the input video stream feature matrix.

Depending on the manner of extracting the N image frames and the manner of determining the key frames, the determined key frames may be included in the N image frames or may be different from the N image frames.

Next, at step 23, the target detection is performed again on the images of the respective key frames acquired at step 22. Specifically, the key frame image may be subjected to component detection by a second component detection model trained in advance to obtain component detection information, and subjected to damage detection by the second damage detection model trained in advance to obtain damage detection information, and a key frame vector may be generated based on the component detection information and the damage detection information.

it is to be understood that the second component detection model herein may be different from the first component detection model employed for the preliminary processing of the image frames to generate the feature matrix. In general, the second component detection model is a more accurate and complex model than the first component detection model, so that the key frames in the video stream can be detected more accurately. In particular, in the case where the feature matrix is generated by the mobile terminal, the first component detection model is typically a model based on a lightweight network architecture, thus being suitable for the limited computational power and resources of the mobile terminal; the second component detection model can be a detection model which has higher requirements on computing capacity and is suitable for a server side, so that more complex operation is performed on image features, and a more accurate result is obtained. Similarly, the second impairment detection model may be more complex and accurate than the first impairment detection model, so as to perform more accurate impairment detection on the keyframes in the video stream.

in one embodiment, the second part detection model is also used for image segmentation of the part. In such a case, the part detection information obtained based on the second part detection model further includes information for image segmentation of the part included in the key frame image, that is, contour information of the part.

In this way, the second component detection model and the second damage detection model detect the target again for the image of the key frame, and based on the obtained component detection information and damage detection information, a key frame vector can be formed. For K keyframes, K keyframe vectors may be formed.

in one specific example, the key frame vector is a 70-dimensional vector, where elements of 1-35 dimensions represent part detection information and elements of 36-70 dimensions represent damage detection information.

Next, at step 24, at least one candidate damaged component is determined for subsequent analysis against the candidate damaged component.

in one embodiment, each component of the vehicle is considered a candidate damaged component. For example, assuming that the vehicle is divided into 100 parts in advance, these 100 parts can be regarded as the damaged parts as candidates. The method has the advantages that omission does not occur, redundant calculation is more, and burden of subsequent processing is larger.

as described above, when generating both the feature matrix and the key frame vector of the video stream, component detection is performed on the image frame, and the obtained component detection information includes the component type predicted for the component. In one embodiment, the component category present in the component detection information is taken as the candidate damaged component.

More specifically, a first set of candidate components may be determined based on component detection information in the N M-dimensional vectors. It is to be understood that the component detection information of each M-dimensional vector may include several component detection boxes and corresponding predicted component categories, and a union of the predicted component categories in the N M-dimensional vectors may be used as the first candidate component set.

similarly, a second set of candidate components, i.e., the union of the predicted component categories in the respective key frame vectors, may be determined based on the component detection information in the K key frame vectors. And then, taking the part which is obtained by merging the first candidate part set and the second candidate part set as a candidate damaged part. In other words, if a component category appears in the feature matrix of the video stream or in the key frame vector, it means that the component of the category is detected in N image frames of the video stream or the component is detected in the key frame, and then the component of the category can be regarded as a candidate damaged component.

in the above manner, a plurality of alternative damaged components may be obtained. The following processing manner will be described by taking an arbitrary one of the components (referred to as a first component for simplicity) as an example.

for any first component, such as the right back gate shown in fig. 3a, the feature matrix of the video stream and the respective key frame vectors are fused to generate a composite feature vector for that first component, step 25. To generate the synthetic feature vector, the fusion operation may include intra-frame fusion and inter-frame fusion. In intra-frame fusion, fusing component detection information and damage detection information in a vector corresponding to each frame to obtain frame comprehensive characteristics of a first component in the frame; and then combining the time sequence information, and fusing the frame comprehensive characteristics of the first component corresponding to each frame through inter-frame fusion to obtain the comprehensive characteristic vector of the first component. The following describes the procedures of intra-frame fusion and inter-frame fusion, respectively.

Intra-frame fusion aims to obtain a component-level damage feature, also called a frame synthesis feature, about a first component in a certain frame. In one embodiment, the frame composite characteristics of the first part may include, a part characteristic of the first part in the frame, and a composite damage characteristic associated with the first part. For example, a vector corresponding to the frame comprehensive feature of the first component may be denoted as V, and then the vector V may be denoted as V ═ C, S, that is, a concatenation of a C part and an S part, where the C part is the component feature of the first component, and the S part is the comprehensive damage feature related to the first component.

the specific steps of intra-frame fusion are described below in connection with any one of the N image frames, hereinafter referred to as the first image frame.

it is assumed that the first image frame corresponds to a first M-dimensional vector containing the part detection information and the damage detection information for the first image frame as described above. In one embodiment, first detection information related to the first part may be extracted from the part detection information in the first M-dimensional vector, and based on the first detection information, a part feature of the first part in the first image frame, i.e., the above-mentioned part C, may be determined as a part of the frame integration feature V.

as described above, the component detection information includes several component detection frames and a predicted component category. In some embodiments, the component detection information further comprises segmentation information of the several components. Accordingly, the first detection information extracted from the first component may include information of a component detection frame corresponding to the first component, for example, the position and size of the component detection frame, corresponding picture convolution information, and the like, and may further include segmentation information or contour information of the first component. Such first detection information may be regarded as a component feature of the first component. Alternatively, the extracted first detection information may be integrated as the component feature of the first component.

For example, in one specific example of the foregoing, each image frame corresponds to an 80-dimensional vector, where elements of 21-50 dimensions represent part detection information. First detection information related to the first part may be extracted from the 21-50 dimensional element, thereby obtaining a part feature C of the first part.

as described above, in some embodiments, video continuity features, such as optical flow variation features between image frames, similarity features between image frames, projection matrices between image frames, and the like, are also included in the M-dimensional vectors corresponding to each image frame. According to a specific example, in the aforementioned 80-dimensional vector, elements of 11-20 dimensions represent video continuity features. In such a case, the video continuity features can be leveraged for reasoning and transformation so that component features of subsequent frames can be derived based on component features of previous frames.

For example, assume that the feature of the component of the first image frame is obtained in the manner described previously. In one embodiment, the N image frames further include a second image frame following the first image frame. The second image frame corresponds to a second M-dimensional vector including video continuity features. For such a second image frame, a video continuity feature may be extracted from the corresponding second M-dimensional vector, and then, based on the above-mentioned first detection information extracted for the first image frame and the video continuity feature, second detection information of the first component in the second image frame is determined; then, the part feature of the first part in the second image frame may be derived based on the second detection information.

More specifically, in one example, the first detection information includes position information of the first component, and the video continuity characteristic includes a projection matrix, so that the position of the first component in the second image frame and thus the component characteristic of the first component in the second image frame can be directly obtained based on the position information of the first component in the first image frame and the projection matrix of the second image frame relative to the first image frame.

In the above manner, the component feature C of the first component can be obtained for any one of the N image frames as a part of the frame integration feature of the first component.

furthermore, a composite damage feature S associated with the first component may also be acquired for each image frame and included in the frame composite features of the first component. The process of obtaining the composite damage signature of the first component is described below.

FIG. 4 illustrates a flowchart of steps to obtain composite damage signatures for a first component that are part of the intra-frame fusion operation of step 25 of FIG. 2, according to one embodiment. Fig. 4 is described with reference to an arbitrary first image frame as an example. As described above, the first image frame corresponds to a first M-dimensional vector including the part detection information and the damage detection information, wherein the damage detection information includes information of a plurality of damage detection frames framing a plurality of damage objects from the first image frame. On this basis, as shown in fig. 4, the comprehensive damage characteristic of the first component may be acquired for the first image frame in the following manner.

At step 41, at least one damage detection frame belonging to the first component is determined based on the component detection information in the first M-dimensional vector and information of the plurality of damage detection frames included in the damage detection information.

as described above, the component detection information includes each component detection frame and its predicted component category. Then, a first part detection frame corresponding to the first part can be extracted therefrom. Then, at least one damage detection frame belonging to the first component is determined based on the positional relationship between the plurality of damage detection frames and the first component detection frame. The above positional relationship may include a center distance, an intersection ratio (IoU), an inclusion relationship, and the like between the damage detection frame and the first component detection frame. In a specific example, if a damage detection box is included in the first part detection box, the damage detection box is considered to belong to the first part; in another example, if the intersection ratio of a damage detection frame to a first part detection frame is greater than a preset ratio threshold, the damage detection frame is considered to belong to the first part; in yet another example, a damage detection frame is considered to belong to the first part if the center distance of the damage detection frame from the first part detection frame is less than a preset distance threshold. Thus, it is possible to determine whether each damage detection frame belongs to the first component, and thereby determine at least one damage detection frame that belongs to the first component.

in one embodiment, the part detection information in the first M-dimensional vector further includes part segmentation information. In such a case, the division information of the first part may be first extracted from the part division information, thereby determining an area covered by the first part, which is referred to as a first area. Then, it is determined whether it falls within the first area based on the position information of the plurality of damage detection frames. Whether a damage detection box falls into the first region may be determined by various specific criteria, for example, in one example, if the center of the damage detection box is located in the first region, the damage detection box is considered to fall into the first region; or, in another example, if more than a predetermined proportion (e.g., 50%) of the entire area of the damage detection box belongs to the first region, the damage detection box is considered to fall into the first region. Based on the above determination, the damage detection frame falling into the first region is determined as the above-described at least one damage detection frame.

In this way, at least one damage detection frame belonging to the first component is determined. Then, in step 42, the lesion features of the at least one lesion detection frame are obtained. For simplicity, the arbitrary one of the at least one damage detection boxes will be referred to as a first damage detection box, and the corresponding damage feature thereof will be referred to as a first damage feature.

in one embodiment, the obtaining of the first damage characteristic corresponding to the first damage detection frame includes obtaining a picture convolution characteristic related to the first damage detection frame. In one example, the damage detection information of the first M-dimensional vector includes the picture convolution features of each damage detection frame, and in such a case, the picture convolution features corresponding to the first damage detection frame can be extracted from the M-dimensional vector. In another example, richer picture convolution features may be extracted again from the corresponding image frame based on the location information of the first impairment detection box.

in one embodiment, the obtaining the first damage characteristic corresponding to the first damage detection frame further includes determining an associated characteristic of the first damage detection frame as a part of the damage characteristic based on an association relationship between the first damage detection frame and other damage detection frames. In different embodiments, the association relationship may include an association relationship of positions of the damage detection frame, an association relationship of predicted damage categories, an association relationship of frame contents reflected by the picture convolution feature, and the like. More specifically, the position correlation between two damage detection frames may include a center distance, an intersection ratio, an area ratio, an inclusion relationship, and the like between the damage detection frames. The predicted damage category correlation may include differences in predicted damage categories, differences in confidence of predictions, and the like. The frame content association relationship may be determined by an operation between convolution vectors formed by convolution features of respective pictures of the two damage detection frames, for example, calculating a distance or similarity between the two convolution vectors.

As described above, the first damage characteristic corresponding to the first damage detection frame is obtained, which includes the picture convolution characteristic of the first damage detection frame itself, and may further include the associated characteristic with other damage detection frames. The feature acquisition is performed for each damage detection frame belonging to the first component, and thus the damage feature of each of at least one damage detection frame belonging to the first component is obtained.

Next, in step 43, a first fusion operation is performed on the damage features of the at least one damage detection frame to obtain a comprehensive damage feature. In different embodiments, the first fusing operation may be a maximum operation, a minimum operation, an average operation, a summation operation, a median operation, or the like, or may be a combination of these operations.

through the above steps 41 to 43, the comprehensive damage characteristic of the first component, i.e., the aforementioned S section, is acquired.

In the above, the component feature C and the comprehensive damage feature S of the first component are acquired for the first image frame, respectively. On the basis, the obtained component feature C and the comprehensive damage feature S of the first component are spliced or combined together, so that the frame comprehensive feature of the first component in the first image frame can be obtained, namely, the first image frame is subjected to intra-frame fusion with respect to the first component. The obtained frame comprehensive characteristics of the first part are the part-level damage characteristics of the first part in the first image frame.

Similarly, for each image frame in the N image frames, the intra-frame fusion may be performed based on the corresponding M-dimensional vector, so as to obtain a frame comprehensive feature of the first component corresponding to the frame, which is denoted as a first vector. In this way, N first vectors may be obtained for the N image frames, corresponding to the frame synthesis features of the first component obtained for the N image frames, respectively.

for each of the K keyframes, the intra-frame fusion may be performed based on the component detection information and the damage detection information in the keyframe vector, so as to obtain a frame comprehensive feature of the first component of the keyframe, which is denoted as a second vector. In this way, K second vectors may be derived for K keyframes, corresponding to the frame synthesis features of the first component derived for K image frames, respectively.

Since the dimensions of the key frame vector may be different from the aforementioned M-dimensional vector, the dimensions of the first vector and the second vector may also be different, but the concept and process of intra-frame fusion are similar.

Next, inter-frame fusion is performed on the frame integrated features of the first component obtained above for each frame (including the N image frames and the K key frames), thereby obtaining an integrated feature vector of the first component.

according to one embodiment, in order to perform inter-frame fusion, first combining the N first vectors to obtain a first combined vector; performing second combination on the K second vectors to obtain a second combined vector; and then, synthesizing the first combination vector and the second combination vector to obtain a comprehensive characteristic vector of the first component.

In one embodiment, in the first combination, N first vectors are spliced according to the time sequence of the corresponding N image frames to obtain the first combination vector.

In another embodiment, the operation of the first combining comprises determining weight factors for the N first vectors; and carrying out weighted combination on the N first vectors according to the weight factors to obtain a first combined vector. The weighted combination may include, weighted sum, weighted average, etc.

each first vector may be given a weighting factor based on different criteria.

in one embodiment, the weighting factor of the first vector corresponding to the image frame is determined according to the time sequence distance between the N image frames and each key frame. It will be appreciated that the video stream is captured at certain basic time intervals t, for example 25 basic frames per second, one frame every 4 ms. Accordingly, the timing distance between two frames may be expressed as a time difference of the shooting time between two frames, for example, 40ms, or may be expressed as a number of frames of basic frames apart, for example, 10 frames apart.

Specifically, in one example, for each image frame in the N image frames, a temporally closest key frame is determined from the K key frames, and a weighting factor of a first vector corresponding to the image frame is determined according to a temporal distance between each image frame and its closest key frame, so that the temporal distance is negatively correlated with the weighting factor.

for example, for a first image frame of any of the N image frames, K temporal distances of the first image frame from K keyframes may be determined in sequence. If the first image frame is a key frame, the time sequence distance between the first image frame and the key frame is 0, and the key frame is the closest key frame. If the first image frame does not belong to any key frame, the key frame with the smallest temporal distance can be selected from the determined K temporal distances from the K key frames as the closest key frame. If there are two equal shortest distances among the K temporal distances (e.g., the case where the first image frame is located in the middle of two key frames), one of the corresponding key frames may be taken as the closest key frame.

then, according to the time sequence distance between the first image frame and the nearest key frame, the weight factor of the first vector corresponding to the first image frame is determined, so that the smaller the time sequence distance is, the larger the weight factor is. That is, the closer to the keyframe, the greater the corresponding weighting factor.

in one example, a weighting factor is determined for each image frame, i.e., w ═ f (d), according to the transformation function f, where d is the temporal distance between the image frame and its closest keyframe, w is the weighting factor, and f can be set such that w is inversely related to d. This results in a continuous distribution of weight factors.

Fig. 5a shows a schematic view of a weight factor distribution of an image frame according to an embodiment. In the example of fig. 5a, a linear transformation function f is used to obtain the weighting factors for each of the N image frames. It can be seen that the weighting factor reaches a maximum at the key frame, and symmetrically decreases linearly as the image frame moves away from the key frame.

in one example, a plurality of range sections are preset, and corresponding weight factors are determined according to the range section in which the time sequence distance d between each image frame and the nearest key frame falls. For example, the weighting factor for image frames within 20ms of the temporal distance d from the nearest key frame may be set to 1.8, the weighting factor for image frames between 20-40ms of the temporal distance d from the key frame may be set to 1.6, and so on.

Fig. 5b shows a schematic view of a weight factor distribution of an image frame according to another embodiment. In the example of fig. 5b, the weighting factors for each of the N image frames are obtained in a segmentation range. It can be seen that the weighting factors symmetrically step down as the image frame moves away from the key frame.

In more examples, a specific form may also be adopted, in which a weighting factor corresponding to an image frame is determined according to the temporal distance of a key frame, so that the temporal distance is inversely related to the weighting factor.

In one embodiment, the weight factor of the corresponding first vector may also be determined based on the quality characteristics of the image frame itself. As described above, the image frame quality feature, which may indicate whether the image frame is blurred, whether the image frame contains a target, whether the illumination is sufficient, whether the photographing angle is a predetermined angle, or the like, may be included in the M-dimensional vector corresponding to each image frame. In this case, the comprehensive quality of each image frame may be determined according to the image frame quality characteristics in the M-dimensional vector corresponding to the image frame, and the weight factor of the first vector corresponding to the image frame may be determined according to the comprehensive quality. The weighting factor may be positively correlated to the overall quality, i.e., the higher the overall quality, the more effective an image frame is for target detection, and the higher the weighting factor of the first vector corresponding to the image frame.

In one embodiment, the timing-based determined weight factor and the quality-feature-based determined weight factor may also be combined, for example, a product thereof is calculated as the weight factor of the first vector corresponding to the image frame.

On the basis of determining the weighting factors, the N first vectors may be weighted combined, thus obtaining a first combined vector.

On the other hand, the K second vectors are subjected to second combination to obtain a second combined vector. The mode of operation of the second combination may be one or more of: and splicing the K second vectors, calculating an average vector, performing pooling or interpolation processing after splicing, and the like.

and on the basis of respectively obtaining the first combination vector and the second combination vector, synthesizing the first combination vector and the second combination vector to obtain the comprehensive characteristic vector of the first component. The integration of the first combined vector and the second combined vector may include one or more of: and splicing the first combination vector and the second combination vector, further extracting the characteristics of the first combination vector and the second combination vector, then splicing, performing further processing after splicing, and the like.

in this way, the integrated feature vector of the first component is obtained by fusing the N M-dimensional vectors in the feature matrix of the video stream and the K key frame vectors, so that the overall damage features of the first component in the N image frames and the K key frames of the video stream can be fully reflected.

referring back to fig. 2, next, in step 26, the comprehensive feature vector of the first component is input into a pre-trained decision tree model, and the damage condition of the first component is determined according to the output of the decision tree model.

In this step, various specific decision tree algorithms, such as gradient boosting decision tree GBDT, classification decision tree CRT, etc., can be used as the decision tree model.

In one embodiment, the decision tree model is embodied as a two-class model by pre-training with labeled samples of two classes. The labeled samples in the two categories are that whether each part in a given sample video stream is damaged or not is marked by a labeling person (damaged is one category, and undamaged is the other category). The pre-training process may include, in the manner of steps 21 to 25 described above, obtaining a composite feature vector for a component based on a given sample video stream, and then predicting whether the component is damaged or not based on the composite feature vector using a decision tree model. And then, comparing the prediction result with the labeling result of the part, and adjusting the model parameters of the decision tree model according to the comparison result, so that the prediction result tends to fit the labeling result. The decision tree binary classification model is obtained through the training.

In such a case, after the integrated feature vector of the first component to be analyzed is input into the decision tree classification model, the model will output one-half of the damaged or not, and then the result of whether the first component is damaged or not can be obtained in step 26.

In another embodiment, the decision tree model is embodied as a multi-classification model by pre-training with multi-classified labeled samples. The multi-classified labeled samples are characterized in that a labeling person labels damage categories (for example, a plurality of damage categories such as scratch, deformation, fragmentation and the like) of various parts in a given sample video stream. The pre-training process may include, in the manner of steps 21 to 25 described above, obtaining a composite feature vector for a component based on a given sample video stream, and then predicting the damage category of the component based on the composite feature vector using a decision tree model. And then, comparing the prediction result with the labeling result of the part, and adjusting the model parameters of the decision tree model according to the comparison result, so that the prediction result tends to fit the labeling result. The decision tree multi-classification model is obtained through training.

In such a case, after the comprehensive feature vector of the first component to be analyzed is input into the decision tree multi-classification model, the model outputs a predicted damage classification category, and then the damage type, i.e. the damaged condition, of the first component can be obtained according to the classification category in step 26.

it will be appreciated that the first component described above is any of the alternative damaged components. For each candidate damaged part, the above process may be performed, obtaining its composite feature vector at step 25, and obtaining its damage prediction result based on the composite feature vector at step 26. Therefore, the damage condition of each candidate damaged part can be obtained, and the overall damage assessment result is obtained.

In a specific example, after the multi-classification decision tree model is adopted to predict each candidate damaged part, the following damage assessment results can be obtained: a right rear door: scraping; a rear bumper: deforming; tail light: and (4) cracking.

in one embodiment, such impairment results are communicated back to the mobile terminal.

On the basis of determining the damage assessment result including the damage condition of each component, in one embodiment, a replacement scheme of each component can be determined according to the damage assessment result.

It can be understood that according to the damage assessment requirement, the staff member can be preset with a mapping table, wherein the repair schemes of various types of components under various damage categories are recorded. For example, for a metal type part, when the damage category is scraping, the corresponding replacement scheme is paint spraying, and when the damage category is deformation, the corresponding replacement scheme is a metal plate; for glass type parts, when the damage category is scratch, the corresponding replacement scheme is to replace glass, and so on.

Thus, for the first component "right rear door" exemplified above, assuming that the damage category is determined to be scratch, first, the type to which the component belongs is determined according to the component category "right rear door", for example, the component is a metal type component, and then, according to the damage category "scratch", the corresponding repair scheme is determined to be: and (5) painting.

Therefore, the repair scheme can be determined for each damaged part, and the damage assessment result and the repair scheme are transmitted to the mobile terminal together, so that more comprehensive intelligent damage assessment is realized.

According to another aspect, an apparatus for vehicle damage assessment is provided, which may be deployed in a server, and the server may be implemented by any device, platform or device cluster having computing and processing capabilities. Fig. 6 shows a schematic block diagram of a vehicle damage assessment device according to one embodiment. As shown in fig. 6, the apparatus 600 includes:

a feature matrix obtaining unit 61 configured to obtain a feature matrix of a video stream, the video stream being captured for a damaged vehicle, the feature matrix including at least N M-dimensional vectors that correspond to N image frames in the video stream, respectively, and are arranged in time series of the N image frames, each M-dimensional vector including at least, for the corresponding image frame, component detection information obtained by a pre-trained first component detection model, and damage detection information obtained by the pre-trained first damage detection model;

a key frame acquiring unit 62 configured to acquire K key frames in the video stream;

A generating unit 63 configured to generate, for the K keyframes, corresponding K keyframe vectors, each keyframe vector including, for a corresponding keyframe image, component detection information obtained by a pre-trained second component detection model, and damage detection information obtained by a pre-trained second damage detection model;

An alternative component determination unit 64 configured to determine at least one alternative damaged component, including a first component;

A fusion unit 65 configured to perform intra-frame fusion on the component detection information and the damage detection information in the vector corresponding to each frame to obtain a frame comprehensive feature of the first component, and perform inter-frame fusion on the frame comprehensive feature of the first component obtained for each frame to generate a comprehensive feature vector for the first component;

And a damage determining unit 66 configured to input the comprehensive feature vector into a pre-trained decision tree model, and determine a damage condition of the first component according to an output of the decision tree model.

In one embodiment, the feature matrix obtaining unit 61 is configured to receive the feature matrix from a mobile terminal.

in another embodiment, the feature matrix obtaining unit 61 is configured to:

acquiring the video stream;

In one embodiment, the key frame obtaining unit 62 is configured to receive the K key frames from the mobile terminal.

in another embodiment, the key frame acquisition unit 62 is configured to: inputting the feature matrix into a key frame prediction model based on a Convolutional Neural Network (CNN), and determining the K key frames according to the output of the key frame prediction model.

According to one embodiment, the alternative component determination unit 64 is configured to:

According to one embodiment, the N image frames comprise a first image frame corresponding to a first M-dimensional vector; the fusion unit 65 is configured to:

according to an embodiment, the N image frames further comprise a second image frame following the first image frame, corresponding to a second M-dimensional vector; each M-dimensional vector further comprises video continuity features; in such a case, the fusion unit 65 is further configured to:

extracting video continuity features from the second M-dimensional vector;

Further, in one embodiment, the video continuity features include at least one of: an optical flow change feature between image frames, a similarity feature between image frames, a transformation feature determined based on a projection matrix between image frames.

according to one embodiment, the N image frames comprise a first image frame corresponding to a first M-dimensional vector; the damage detection information in the first M-dimensional vector includes information of a plurality of damage detection frames framing a plurality of damaged objects from the first image frame; the fusion unit 65 is configured to:

Acquiring damage characteristics of the at least one damage detection frame;

further, in an embodiment, the component detection information in the first M-dimensional vector includes information of a plurality of component detection boxes for selecting a plurality of components from the first image frame, and the fusion unit 65 is configured to determine at least one damage detection box belonging to the first component by:

in one embodiment, the component detection information in the first M-dimensional vector comprises component segmentation information; in such a case, the fusion unit 65 is configured to determine at least one damage detection box belonging to the first component by:

according to one embodiment, the at least one impairment detection block comprises a first impairment detection block; the fusion unit 65 is configured to obtain a first damage feature corresponding to the first damage detection frame, where the first damage feature includes a picture convolution feature related to the first damage detection frame.

Further, according to an embodiment, the damage detection information further includes a predicted damage category corresponding to each of the plurality of damage detection frames; the fusion unit 65 is further configured to determine a first association feature as a part of the first damage feature according to an association relationship between the first damage detection frame and other damage detection frames in the plurality of damage detection frames, where the association relationship at least includes one or more of the following: and predicting the position incidence relation of the damage detection frame, predicting the incidence relation of the damage category and reflecting the frame content incidence relation through the picture convolution characteristics.

Still further, the first fusing operation performed by the fusing unit 65 on the lesion feature of the at least one lesion detection frame includes one or more of the following: and taking the maximum operation, taking the minimum operation, calculating the average operation, summing the operation and calculating the median operation.

According to an embodiment, the fusion unit 65 is configured to perform the following inter-frame fusion:

Further, in an embodiment, the fusion unit 65 may splice the N first vectors according to a time sequence of the corresponding N image frames.

In another embodiment, the fusion unit 65 may be configured to determine weighting factors for the N first vectors; and carrying out weighted combination on the N first vectors according to the weight factors.

Further, in an embodiment, the fusion unit 65 may determine the weighting factors of the N first vectors by:

In another embodiment, each of the M-dimensional vectors further includes an image frame quality feature; the fusion unit 65 may determine the weighting factors of the N first vectors by: and for each image frame in the N image frames, determining the weight factor of a first vector corresponding to the image frame according to the image frame quality characteristic in the M-dimensional vector corresponding to the image frame.

In a particular implementation, the image frame quality features include at least one of: the image processing device comprises a feature indicating whether an image frame is blurred or not, a feature indicating whether the image frame contains a target or not, a feature indicating whether the image frame is sufficiently illuminated or not, and a feature indicating whether the image frame is shot at a predetermined angle or not.

According to one embodiment, the decision tree model is a binary model; the damage determination unit 66 is configured to determine whether the first component is damaged according to one-half of the output of the decision tree model.

According to another embodiment, the decision tree model is a multi-classification model; the damage determination unit 66 is configured to determine a damage category of the first component according to the classification category output by the decision tree model.

According to an embodiment, the apparatus 600 further comprises a repair determination unit (not shown) configured to determine a repair plan for the first component according to the component category of the first component and the damage condition of the first component.

By the method and the device, intelligent damage assessment is performed on the basis of the video stream for shooting the damaged vehicle.

according to an embodiment of another aspect, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method described in connection with fig. 2.

According to an embodiment of yet another aspect, there is also provided a computing device comprising a memory and a processor, the memory having stored therein executable code, the processor, when executing the executable code, implementing the method described in connection with fig. 2.

those skilled in the art will recognize that, in one or more of the examples described above, the functions described in this invention may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.

the above-mentioned embodiments, objects, technical solutions and advantages of the present invention are further described in detail, it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made on the basis of the technical solutions of the present invention should be included in the scope of the present invention.

Claims

1. A computer-implemented vehicle damage assessment method, comprising:

Acquiring K key frames in the video stream;

2. the method of claim 1, wherein obtaining a feature matrix for a video stream comprises receiving the feature matrix from a mobile terminal.

3. the method of claim 1, wherein obtaining a feature matrix for a video stream comprises:

acquiring the video stream;

4. The method of claim 1, wherein obtaining K key frames in the video stream comprises: and receiving the K key frames from the mobile terminal.

5. the method of claim 1, wherein obtaining K key frames in the video stream comprises: inputting the feature matrix into a key frame prediction model based on a Convolutional Neural Network (CNN), and determining the K key frames according to the output of the key frame prediction model.

6. the method of claim 1, wherein the second component detection model is different from the first component detection model, and the second damage detection model is different from the first damage detection model.

7. the method of claim 1, wherein determining at least one candidate damaged component comprises:

8. the method of claim 1, wherein the N image frames comprise a first image frame corresponding to a first M-dimensional vector;

the obtaining of the frame synthesis feature of the first component includes:

9. The method of claim 8, wherein the N image frames further include a second image frame following the first image frame, corresponding to a second M-dimensional vector; each M-dimensional vector further comprises video continuity features;

The obtaining the frame synthesis feature of the first component further comprises:

Extracting video continuity features from the second M-dimensional vector;

10. The method of claim 9, wherein the video continuity features comprise at least one of: an optical flow change feature between image frames, a similarity feature between image frames, a transformation feature determined based on a projection matrix between image frames.

11. The method of claim 1, wherein the N image frames comprise a first image frame corresponding to a first M-dimensional vector; the damage detection information in the first M-dimensional vector includes information of a plurality of damage detection frames framing a plurality of damaged objects from the first image frame;

The obtaining of the frame synthesis feature of the first component includes:

acquiring damage characteristics of the at least one damage detection frame;

12. the method of claim 11, wherein the part detection information in the first M-dimensional vector comprises information of a plurality of part detection boxes framing a plurality of parts from the first image frame;

said determining at least one damage detection box belonging to the first component comprises:

13. the method of claim 11, wherein the part detection information in the first M-dimensional vector comprises part segmentation information;

14. The method of claim 11, wherein the at least one impairment detection block comprises a first impairment detection block;

The obtaining of the damage characteristic of the at least one damage detection frame includes obtaining a first damage characteristic corresponding to the first damage detection frame, where the first damage characteristic includes a picture convolution characteristic related to the first damage detection frame.

15. The method of claim 14, wherein the impairment detection information further comprises a predicted impairment category corresponding to each of the plurality of impairment detection blocks;

The obtaining of the first damage characteristic corresponding to the first damage detection frame further includes determining, according to an association relationship between the first damage detection frame and other damage detection frames in the plurality of damage detection frames, a first association characteristic as a part of the first damage characteristic, where the association relationship at least includes one or more of the following: and predicting the position incidence relation of the damage detection frame, predicting the incidence relation of the damage category and reflecting the frame content incidence relation through the picture convolution characteristics.

16. The method of claim 11, wherein the first fusion operation comprises one or more of: and taking the maximum operation, taking the minimum operation, calculating the average operation, summing the operation and calculating the median operation.

17. The method of claim 1, wherein generating a composite feature vector for the first component by inter-frame fusing the frame composite features of the first component obtained for each frame comprises:

18. The method of claim 17, wherein the first combining the N first vectors comprises:

and splicing the N first vectors according to the time sequence of the corresponding N image frames.

19. The method of claim 17, wherein the first combining the N first vectors comprises:

determining weight factors of the N first vectors;

And carrying out weighted combination on the N first vectors according to the weight factors.

20. The method of claim 19, wherein said determining weight factors for said N first vectors comprises:

21. The method of claim 19, wherein each M-dimensional vector further comprises, an image frame quality feature;

The determining the weighting factors of the N first vectors comprises:

And for each image frame in the N image frames, determining the weight factor of a first vector corresponding to the image frame according to the image frame quality characteristic in the M-dimensional vector corresponding to the image frame.

22. The method of claim 21, wherein the image frame quality features comprise at least one of: the image processing device comprises a feature indicating whether an image frame is blurred or not, a feature indicating whether the image frame contains a target or not, a feature indicating whether the image frame is sufficiently illuminated or not, and a feature indicating whether the image frame is shot at a predetermined angle or not.

23. The method of claim 1, wherein the decision tree model is a binary model; the determining the damage condition of the first component according to the output of the decision tree model comprises:

and determining whether the first part is damaged according to one-half of the output class of the decision tree model.

24. the method of claim 1, wherein the decision tree model is a multi-classification model; the determining the damage condition of the first component according to the output of the decision tree model comprises:

and determining the damage category of the first component according to the classification category output by the decision tree model.

25. The method of claim 1, further comprising determining a repair plan for the first component based on the component category of the first component and the damage condition of the first component.

26. A vehicle damage assessment device comprising:

27. A computer-readable storage medium, having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of any of claims 1-25.

28. A computing device comprising a memory and a processor, wherein the memory has stored therein executable code that, when executed by the processor, performs the method of any one of claims 1-25.