Background
Vehicle appearance part identification is an important part in various automobile businesses, and the vehicle appearance part identification is required in vehicle taking and returning processes such as automobile insurance claim, time-sharing lease and automobile daily lease. The current common implementation modes include two types, one is that field workers survey the vehicle to be recognized and complete the vehicle appearance part recognition, and the other is that users take pictures (or videos) and process the pictures with a deep learning model, such as the multi-task vehicle part recognition model, method and system based on deep learning proposed in the prior art.
The existing system has the following problems:
1. all the assembly component information of the vehicle appearance cannot be accurately identified by singly using a classification model or a detection model, because the vehicle appearance components and the sub-component components are different in size and size in reality, and the detailed images of the parts are similar, so that the damage made from a single image at a longer distance is always the information covering the incomplete vehicle appearance components, the appearance is obviously weaker than the human eye identification capability, and the adoption of image feature matching or logic matching (such as spatial position forced correspondence) usually causes matching failure or error, has low accuracy and is not robust to different scenes (such as night); logical matching may require the user to be careful when taking a picture, and attempting to match by manually and deliberately adjusting the angle and distance may significantly degrade the user's acquisition experience.
2. In actual automobile business application scenes such as insurance claims, car inspection, time-sharing lease and the like, long-distance shooting and short-distance shooting of vehicles are required, and accurate identification of appearances of the vehicles at different distances is a very challenging visual task. The existing system is only suitable for long-distance shooting scenes, and how to accurately identify vehicle components under the condition of short distance becomes a difficult problem influencing the landing of the whole vehicle component identification system.
3. In reality, the loss assessment picture of the claim case is read from far to near in sequence, and is a relatively continuous progressive process, the human eye can accurately identify the appearance part from the vehicle appearance picture under the condition of medium distance or long distance, but the identification capability of the human eye is reduced along with the continuous reduction of the distance, the reduction is mainly caused by that the vehicle detail structures are similar, when only the detail structures exist in the shot picture, even an experienced person can judge that a plurality of possible vehicle positions correspond to the detail structures, the identification capability is consistent with the cognitive knowledge, and therefore, for the appearance parts related to the picture under the special near distance, namely only 1 or 2, the identification capability of the background person or a deep learning model is insufficient. Most mature deep learning image recognition technologies are single-picture recognition, and from the perspective of a single picture, the practical contradiction exists that the vehicle appearance part where the damage is located is easy to recognize at a longer distance but difficult to see or see the damage details, and the vehicle appearance part where the damage is located and the sub-component information on the vehicle appearance part where the damage is located are easy to see and recognize but difficult to recognize at a shorter distance.
Disclosure of Invention
Aiming at the problems, the invention provides a video-based vehicle appearance component deep learning segmentation method and a video-based vehicle appearance component deep learning segmentation system, which are used for segmenting and coloring a start frame image through semantic segmentation, conducting learning reasoning is carried out on a segmented coloring image based on the start frame image by utilizing a semi-supervised video target segmentation model, and pixel-level target tracking is realized, so that the problem of image component segmentation identification at different distances is solved, the association of component areas of different images is not required through image feature matching and logical relations, and the precision and the robustness of video segmentation are improved.
In order to achieve the above object, the present invention provides a video-based vehicle exterior part deep learning segmentation method, including: acquiring a recorded video of the vehicle appearance component, and determining a starting position of the vehicle appearance component recognizable in the video as a starting frame; storing one frame of image of the video, which is started from the initial frame and is spaced by a preset number of frames, into a preset image buffer area; performing semantic segmentation on the image of the initial frame, and coloring based on a mask label of the semantic segmentation image to form a semantic segmentation coloring picture; and inputting the semantic segmentation coloring picture and the rest images in the image buffer area into a trained semi-supervised video target segmentation model for reasoning and segmentation, and outputting segmentation images corresponding to all the images in the image buffer area.
In the above technical solution, preferably, the training method of the semi-supervised video target segmentation model includes: acquiring a recorded video of the vehicle appearance component, and determining a starting position of the vehicle appearance component recognizable in the video as a starting frame; storing one frame of image of the video, which is started from the initial frame and is spaced by a preset number of frames, into a preset image buffer area; performing semantic segmentation on the image of the initial frame, and coloring based on a mask label of the semantic segmentation image to form a semantic segmentation coloring picture; segmenting the rest images in the image buffer area, and labeling the segmented images; and taking the semantic segmentation coloring picture and other images in the image buffer area as input, and taking the image which corresponds to the segmentation and labeling of the image buffer area as output, and training the semi-supervised video target segmentation model.
In the above technical solution, preferably, the semantically segmenting the image of the start frame, and coloring based on the semantically segmented image mask label, and forming the semantically segmented colored picture specifically includes: performing semantic segmentation on the image of the initial frame by adopting a semantic segmentation algorithm; and coloring the segmented image through a preset conversion function according to the mask label of the semantically segmented image, wherein the colored image is used as a semantically segmented coloring image.
In the above technical solution, preferably, the video takes one frame of image from the start frame every 3 frames and stores the frame of image in the image buffer.
The invention also provides a video-based vehicle appearance component deep learning segmentation system, and a video-based vehicle appearance component deep learning segmentation method provided by any one of the technical schemes comprises the following steps: the video acquisition module is used for acquiring a recorded video of the vehicle appearance component and determining a starting position of the vehicle appearance component which can be identified in the video as a starting frame; the image inter-taking module is used for taking one frame of image from the initial frame of the video every preset number of frames and storing the frame of image into a preset image buffer area; the segmentation coloring module is used for performing semantic segmentation on the image of the initial frame and coloring based on a mask label of the semantic segmentation image to form a semantic segmentation coloring image; and the video segmentation module is used for inputting the semantic segmentation coloring picture and other images in the image buffer area into a trained semi-supervised video target segmentation model for reasoning and segmentation and outputting segmentation images corresponding to all the images in the image buffer area.
In the above technical solution, preferably, the training method of the semi-supervised video target segmentation model includes: acquiring a recorded video of the vehicle appearance component, and determining a starting position of the vehicle appearance component recognizable in the video as a starting frame; storing one frame of image of the video, which is started from the initial frame and is spaced by a preset number of frames, into a preset image buffer area; performing semantic segmentation on the image of the initial frame, and coloring based on a mask label of the semantic segmentation image to form a semantic segmentation coloring picture; segmenting the rest images in the image buffer area, and labeling the segmented images; and taking the semantic segmentation coloring picture and other images in the image buffer area as input, and taking the image which corresponds to the segmentation and labeling of the image buffer area as output, and training the semi-supervised video target segmentation model.
In the above technical solution, preferably, the segmentation coloring module is specifically configured to: performing semantic segmentation on the image of the initial frame by adopting a semantic segmentation algorithm; and coloring the segmented image through a preset conversion function according to the mask label of the semantically segmented image, wherein the colored image is used as a semantically segmented coloring image.
In the above technical solution, preferably, the image inter-taking module takes one frame of image of the video from the start frame every 3 frames and stores the frame of image in the image buffer area.
Compared with the prior art, the invention has the beneficial effects that: the starting frame image is segmented and colored through semantic segmentation, a semi-supervised video target segmentation model is utilized, conducting learning reasoning is carried out on the segmented colored image based on the starting frame image, and pixel-level target tracking is achieved, so that the problem of picture component segmentation identification under different distances is solved, component areas of different pictures do not need to be associated through image feature matching and logical relations, and the precision and robustness of video segmentation are improved.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.
The invention is described in further detail below with reference to the attached drawing figures:
as shown in fig. 1, the method for deeply learning and segmenting the vehicle appearance part based on the video provided by the invention comprises the following steps: acquiring a recorded video of the vehicle appearance component, and determining a starting position of the vehicle appearance component recognizable in the video as a starting frame; storing one frame of image of the video, which is started from the initial frame and is spaced by a preset number of frames, into a preset image buffer area; performing semantic segmentation on the image of the initial frame, and coloring based on a mask label of the semantic segmented image to form a semantic segmented colored image; and inputting the semantic segmentation coloring picture and the rest of images in the image buffer area into a trained semi-supervised video target segmentation model for reasoning and segmentation, and outputting segmentation images corresponding to all the images in the image buffer area.
In the embodiment, the starting frame image is segmented and colored through semantic segmentation, a semi-supervised video target segmentation model is utilized, and conducted learning reasoning is carried out on the segmented colored image based on the starting frame image, so that the pixel-level target tracking is realized, the problem of image component segmentation identification at different distances is solved, the association of component areas of different images is not needed through image feature matching and logical relations, and the precision and the robustness of video segmentation are improved.
Specifically, taking the segmentation of the video for damage assessment as an example, a category of a "recognizable image of vehicle appearance parts" is captured in the video damage assessment recording, the category can be obtained by using a deep learning classification model, and the current frame is used as a starting image for the segmentation of the semi-supervised video target. An image buffer area is established in advance as a data storage space of the video loss assessment, assuming that the video frame rate is 30 frames/s, preferably, one frame of image is taken to be placed in the image buffer area every interval N-3 frames, where N-3 frames are tested experimentally, and if N is taken too large, the pixel offset of the image sequence in the buffer area is large, which exceeds the learning capability of a deep learning network. And performing semantic segmentation of a single picture on the 1 st frame (initial frame) of the image buffer area, and coloring based on the output semantic segmentation image mask label, thereby converting the frame into a colorful semantic segmentation coloring picture. And inputting the pictures in the image buffer area and the semantic segmentation coloring pictures of the 1 st frame into a semi-supervised video target segmentation model, and reasoning to obtain video segmentation images of all the pictures in the image buffer area, thereby realizing the pixel-level part identification information of the vehicle appearance parts and the sub-part accessories from far to near.
As shown in fig. 2, specifically, the image semantic segmentation algorithm, such as: and performing appearance component segmentation on the vehicle picture after training, such as deep, PSPNet, SegNet, FCN, DIS, IDW-CNN and the like, and obtaining a segmentation result. The principle is illustrated below by the example of Deeplab:
1) a deep convolution neural network, such as VGG-16 or ResNet-101, adopts a full convolution mode to reduce the degree of signal down-sampling (from 32x to 8x) by using porous convolution;
2) in a bilinear interpolation stage, increasing the resolution of the feature map to the original image;
3) and optimizing a segmentation result by using a conditional random field, and better grabbing the edge of the object.
The meaning of segmentation is to give specific pixel information of each part while distinguishing various vehicle appearance parts. The semi-supervised video object segmentation technology only gives a correct segmentation mask of a 1 st frame of a video, and then segments a labeled object in each subsequent continuous frame in a pixel level, which is actually a pixel-level object tracking problem, and the following methods are commonly used: STM, CFBI, VOT, FTMU, TVOS, etc. Taking TVOS as an example, the method adopts a label propagation mode, is simple, high in performance and high in efficiency, conducts learning based on the current frame, the historical frame and the image label to infer the image label of the current frame, and has a certain short-time memory function.
In the above embodiment, preferably, the training method of the semi-supervised video object segmentation model includes: acquiring a recorded video of the vehicle appearance component, and determining a starting position of the vehicle appearance component recognizable in the video as a starting frame; storing one frame of image of the video, which is started from the initial frame and is spaced by a preset number of frames, into a preset image buffer area; performing semantic segmentation on the image of the initial frame, and coloring based on a mask label of the semantic segmented image to form a semantic segmented colored image; segmenting the rest images in the image buffer area, and labeling the segmented images; and training a semi-supervised video target segmentation model by taking the semantic segmentation coloring picture and the rest of images in the image buffer area as input and taking the segmented and labeled images of the corresponding image buffer area as output.
In the embodiment, the image in the image buffer area and the colored initial frame image are used as input, the pre-labeled segmentation image is used as output, the semi-supervised video target segmentation model is trained until convergence, and after a new initial frame semantic segmentation coloring image and a video image of a subsequent frame are input, the segmentation image of the video image of the subsequent frame can be obtained through reasoning, so that the pixel-level target tracking is realized. By utilizing the trained semi-supervised video target segmentation model, the vehicle appearance part identification capability is more robust and the accuracy is higher.
In the foregoing embodiment, preferably, performing semantic segmentation on the image of the start frame, and coloring based on the semantic segmentation image mask label, and forming a semantic segmentation colored picture specifically includes: performing semantic segmentation on the image of the initial frame by adopting a semantic segmentation algorithm; and coloring the segmented image through a preset conversion function according to the mask label of the semantically segmented image, wherein the colored image is used as a semantically segmented coloring image.
As shown in fig. 3, the present invention further provides a video-based vehicle exterior component deep learning segmentation system, to which the video-based vehicle exterior component deep learning segmentation method proposed in any one of the above embodiments is applied, including: the video acquisition module 11 is configured to acquire a recorded video of the vehicle appearance component, and determine a starting position of the vehicle appearance component recognizable in the video as a starting frame; the image inter-taking module 12 is used for taking one frame of image from the initial frame of the video at intervals of a preset number of frames and storing the frame of image into a preset image buffer area; the segmentation coloring module 13 is configured to perform semantic segmentation on the image of the start frame, and color the image based on the mask label of the semantic segmentation image to form a semantic segmentation coloring image; and the video segmentation module 14 is used for inputting the semantic segmentation coloring picture and the rest of images in the image buffer area into the trained semi-supervised video target segmentation model for reasoning and segmentation, and outputting segmentation images corresponding to all the images in the image buffer area.
In the embodiment, the vehicle appearance component deep learning segmentation system based on the video applies the vehicle appearance component deep learning segmentation method in the embodiment, the starting frame image is segmented and colored through semantic segmentation, and a semi-supervised video target segmentation model is utilized to conduct learning inference based on the segmented colored image of the starting frame image, so that pixel-level target tracking is realized, the problem of image component segmentation identification at different distances is solved, association of component areas of different images is not needed through image feature matching and logical relations, and the precision and robustness of video segmentation are improved.
In the above embodiment, preferably, the training method of the semi-supervised video object segmentation model includes: acquiring a recorded video of the vehicle appearance component, and determining a starting position of the vehicle appearance component recognizable in the video as a starting frame; storing one frame of image of the video, which is started from the initial frame and is spaced by a preset number of frames, into a preset image buffer area; performing semantic segmentation on the image of the initial frame, and coloring based on a mask label of the semantic segmented image to form a semantic segmented colored image; segmenting the rest images in the image buffer area, and labeling the segmented images; and training a semi-supervised video target segmentation model by taking the semantic segmentation coloring picture and the rest of images in the image buffer area as input and taking the segmented and labeled images of the corresponding image buffer area as output.
In the embodiment, the image in the image buffer area and the colored initial frame image are used as input, the pre-labeled segmentation image is used as output, the semi-supervised video target segmentation model is trained until convergence, and after a new initial frame semantic segmentation coloring image and a video image of a subsequent frame are input, the segmentation image of the video image of the subsequent frame can be obtained through reasoning, so that the pixel-level target tracking is realized. By utilizing the trained semi-supervised video target segmentation model, the vehicle appearance part identification capability is more robust and the accuracy is higher.
In the above embodiment, preferably, the segmentation and coloring module is specifically configured to: performing semantic segmentation on the image of the initial frame by adopting a semantic segmentation algorithm; and coloring the segmented image through a preset conversion function according to the mask label of the semantically segmented image, wherein the colored image is used as a semantically segmented coloring image.
In the above embodiment, preferably, the inter-image fetching module fetches one frame of image of the video from the start frame every 3 frames and stores the fetched frame of image in the image buffer.
The above is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes will occur to those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.