CN110782469A - Video frame image segmentation method and device, electronic equipment and storage medium - Google Patents

Video frame image segmentation method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN110782469A
CN110782469A CN201911025928.4A CN201911025928A CN110782469A CN 110782469 A CN110782469 A CN 110782469A CN 201911025928 A CN201911025928 A CN 201911025928A CN 110782469 A CN110782469 A CN 110782469A
Authority
CN
China
Prior art keywords
video frame
semantic segmentation
processed
image
previous
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911025928.4A
Other languages
Chinese (zh)
Inventor
郭益林
赵松涛
宋丛礼
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Dajia Internet Information Technology Co Ltd
Original Assignee
Beijing Dajia Internet Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Dajia Internet Information Technology Co Ltd filed Critical Beijing Dajia Internet Information Technology Co Ltd
Priority to CN201911025928.4A priority Critical patent/CN110782469A/en
Publication of CN110782469A publication Critical patent/CN110782469A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/12Edge-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/73Deblurring; Sharpening
    • G06T5/75Unsharp masking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The application relates to a video frame image segmentation method, a video frame image segmentation device, electronic equipment and a storage medium, wherein the method comprises the following steps: obtaining semantic segmentation results of a previous video frame and a previous video frame of a video frame to be processed; performing boundary fuzzy processing on the semantic segmentation result of the previous video frame to obtain a fuzzy boundary image; and inputting the video frame to be processed, the blurred boundary image and the previous video frame into a pre-trained network model to obtain a semantic segmentation result of the video frame to be processed. According to the technical scheme of the application, the semantic segmentation process of the video frame to be processed is guided through the previous frame image of the video frame, the boundary of the semantic segmentation result of the previous frame image is subjected to fuzzy processing, the influence of errors of the semantic segmentation result of the previous frame image on the next array image can be avoided, and the accuracy of the semantic segmentation process is improved.

Description

Video frame image segmentation method and device, electronic equipment and storage medium
Technical Field
The present application relates to the field of image processing technologies, and in particular, to a method and an apparatus for segmenting a video frame image, an electronic device, and a storage medium.
Background
Currently, semantic segmentation is widely applied to short video editing and image editing as a common algorithm in short video application and photographic image editing application. For example, human body matting, hair segmentation, scene segmentation and the like can be performed through semantic segmentation, and region information is provided for background replacement or special effect production. When the video frame image is subjected to voice segmentation, the semantic segmentation result of the previous frame image is used as the guidance of the semantic segmentation of the next frame image, so that the semantic segmentation precision is improved.
However, in the video image, since the image change of two consecutive frames is small, the result of semantic segmentation of the previous frame is directly used as a guidance for semantic segmentation of the next frame, and there may be a case that the semantic segmentation of the previous frame is incorrect or has an error, so that the semantic segmentation of the next frame may be misguided, resulting in delay of semantic segmentation and accumulation of errors.
Disclosure of Invention
An object of the embodiments of the present application is to provide a method and an apparatus for segmenting a video frame image, an electronic device, and a storage medium, so as to solve how to solve the problem that the segmentation of a previous frame image is mistaken for the speech segmentation of a next frame image. The specific technical scheme is as follows:
according to a first aspect of the embodiments of the present disclosure, there is provided a video frame image segmentation method, including:
obtaining semantic segmentation results of a previous video frame and a previous video frame of a video frame to be processed;
performing boundary fuzzy processing on the semantic segmentation result of the previous video frame to obtain a fuzzy boundary image;
and inputting the video frame to be processed, the blurred boundary image and the previous video frame into a pre-trained network model to obtain a semantic segmentation result of the video frame to be processed.
Optionally, the training method of the pre-trained network model includes:
step A, obtaining semantic segmentation results of a previous frame image and a previous frame image of a video frame to be processed, wherein the video frame to be processed is marked with a standard semantic segmentation result;
b, performing boundary fuzzy processing on the semantic segmentation result of the previous frame of image to obtain a target fuzzy boundary image after fuzzy processing;
step C, inputting the target fuzzy boundary image, the previous frame image and the video frame to be processed into a network model to obtain a semantic segmentation result of the video frame to be processed;
step D, calculating the error between the semantic segmentation result of the video frame to be processed and the standard semantic segmentation result;
and E, judging whether the network model is converged or not according to the error, if the network model is converged, obtaining a trained network model, and if the network model is not converged, adjusting parameters of the network model to continue training the network model until the network model is converged.
Optionally, performing boundary blurring processing on the semantic segmentation result of the previous video frame to obtain a blurred boundary image, including:
and expanding the boundary of the semantic segmentation result of the previous video frame to a preset pixel width to obtain a blurred boundary image.
Optionally, the expanding the boundary of the semantic segmentation result of the previous video frame to a preset pixel width to obtain a blurred boundary image includes:
the boundary of the semantic segmentation result of the previous video frame is inwards expanded by a preset pixel width to obtain a blurred boundary image;
or, the boundary of the semantic segmentation result of the previous video frame is outwards expanded by a preset pixel width to obtain a blurred boundary image;
or, the first pixel width is expanded inwards and the second pixel width is expanded outwards for the boundary of the semantic segmentation result of the previous video frame, so as to obtain a blurred boundary image, wherein the sum of the first pixel width and the second pixel width is a preset pixel width.
Optionally, before the video frame to be processed, the blurred boundary image, and the previous video frame are input into a pre-trained network model to obtain a semantic segmentation result of the video frame to be processed, the method further includes:
and performing data preprocessing on the video frame to be processed and the previous video frame according to the video frame to be processed, the blurred boundary and the previous video frame, wherein the difference between the video frame to be processed and the previous video frame after the data preprocessing is increased.
Optionally, the data preprocessing includes: one or more of displacement variation, scale variation, rotation variation, thin plate spline variation.
Optionally, after the video frame to be processed, the blurred boundary image, and the previous video frame are input into a pre-trained network model to obtain a semantic segmentation result of the video frame to be processed, the method further includes:
determining a foreground target and the position of the foreground target in the video frame to be processed according to the semantic segmentation result;
and adding the foreground target to a corresponding position in a preset background image according to the position of the foreground target to obtain a replaced video frame.
Optionally, after the foreground object is added to a corresponding position in a preset background image according to the position of the foreground object and a replaced video frame is obtained, the method further includes:
and generating the target video by using the plurality of replaced video frames.
According to a second aspect of the embodiments of the present disclosure, there is provided a video frame image segmentation apparatus including:
the result acquisition module is used for acquiring the semantic segmentation result of the last video frame and the last video frame of the video frames to be processed;
the fuzzy processing module is used for carrying out boundary fuzzy processing on the semantic segmentation result of the previous video frame to obtain a fuzzy boundary image;
and the network model module is used for inputting the video frame to be processed, the blurred boundary image and the previous video frame into a pre-trained network model to obtain a semantic segmentation result of the video frame to be processed.
Optionally, the training method of the pre-trained network model includes:
step A, obtaining semantic segmentation results of a previous frame image and a previous frame image of a video frame to be processed, wherein the video frame to be processed is marked with a standard semantic segmentation result;
b, performing boundary fuzzy processing on the semantic segmentation result of the previous frame of image to obtain a target fuzzy boundary image after fuzzy processing;
step C, inputting the target fuzzy boundary image, the previous frame image and the video frame to be processed into a network model to obtain a semantic segmentation result of the video frame to be processed;
step D, calculating the error between the semantic segmentation result of the video frame to be processed and the standard semantic segmentation result;
and E, judging whether the network model is converged or not according to the error, if the network model is converged, obtaining a trained network model, and if the network model is not converged, adjusting parameters of the network model to continue training the network model until the network model is converged.
Optionally, the fuzzy processing module includes:
and the boundary expansion submodule is used for expanding the boundary of the semantic segmentation result of the previous video frame to a preset pixel width to obtain a blurred boundary image.
Optionally, the boundary enlarging submodule includes:
the inward expansion unit is used for expanding the preset pixel width inwards on the boundary of the semantic segmentation result of the previous video frame to obtain a blurred boundary image;
the outward expansion unit is used for expanding the preset pixel width outward on the boundary of the semantic segmentation result of the previous video frame to obtain a blurred boundary image;
and the internal and external expanding unit is used for expanding the first pixel width inwards and expanding the second pixel width outwards on the boundary of the semantic segmentation result of the previous video frame to obtain a blurred boundary image, wherein the sum of the first pixel width and the second pixel width is a preset pixel width.
Optionally, the apparatus further comprises:
and the preprocessing module is used for preprocessing data of the video frame to be processed and the previous video frame according to the video frame to be processed, the blurred boundary and the previous video frame, wherein the difference between the video frame to be processed and the previous video frame after data preprocessing is increased.
Optionally, the data preprocessing includes: one or more of displacement variation, scale variation, rotation variation, thin plate spline variation.
Optionally, the apparatus further comprises:
the video replacement module is used for determining a foreground target and the position of the foreground target in the video frame to be processed according to the semantic segmentation result;
and the background adding module is used for adding the foreground target to the corresponding position in the preset background image according to the position of the foreground target to obtain the replaced video frame.
Optionally, the apparatus further comprises:
and the video generation module is used for generating the target video by utilizing the plurality of replaced video frames.
According to a third aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to execute the instructions to implement any of the video frame image segmentation methods described above.
According to a fourth aspect of embodiments of the present disclosure, there is provided a storage medium,
the instructions in the storage medium, when executed by a processor of the electronic device, enable the electronic device to perform any of the video frame image segmentation methods described above.
According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product which, when executed by a computer, enables the computer to perform any one of the video frame image segmentation methods described above.
The embodiment of the application provides a video frame image segmentation method, a video frame image segmentation device, electronic equipment and a storage medium, wherein the method comprises the following steps: obtaining semantic segmentation results of a previous video frame and a previous video frame of a video frame to be processed; performing boundary fuzzy processing on the semantic segmentation result of the previous video frame to obtain a fuzzy boundary image; and inputting the video frame to be processed, the blurred boundary image and the previous video frame into a pre-trained network model to obtain a semantic segmentation result of the video frame to be processed. According to the technical scheme of the application, the semantic segmentation process of the video frame to be processed is guided through the previous frame image of the video frame, the boundary of the semantic segmentation result of the previous frame image is subjected to fuzzy processing, the influence of errors of the semantic segmentation result of the previous frame image on the next array image can be avoided, and the accuracy of the semantic segmentation process is improved. Of course, not all advantages described above need to be achieved at the same time in the practice of any one product or method of the present application.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flow diagram illustrating a method of video frame image segmentation in accordance with an exemplary embodiment;
FIG. 2 is yet another flow diagram illustrating a method of video frame image segmentation in accordance with an exemplary embodiment;
FIG. 3 is a block diagram illustrating a video frame image segmentation apparatus according to an exemplary embodiment;
FIG. 4 is yet another block diagram illustrating a video frame image segmentation apparatus in accordance with an exemplary embodiment;
FIG. 5 is a schematic diagram of an electronic device shown in accordance with an exemplary embodiment;
FIG. 6 is a schematic diagram of a storage medium shown in accordance with an exemplary embodiment.
Detailed Description
In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.
It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
The video frame image segmentation method of the embodiment of the disclosure is directed to a video in an intelligent terminal device, and therefore can be executed through the intelligent terminal device, and specifically, the intelligent terminal device can be a mobile phone, a computer or a server.
Semantic segmentation: refers to a deep learning algorithm that associates a label or category with each pixel of a picture to identify the set of pixels that constitute a distinguishable category.
Fig. 1 is a flowchart illustrating a video frame image segmentation method according to an exemplary embodiment, where as shown in fig. 1, the video frame image segmentation method is applied to an intelligent terminal, and includes the following steps:
in step 101, a semantic segmentation result of a previous video frame and a previous video frame of a video frame to be processed is obtained.
The video frame to be processed may be a video frame in a video, where semantic segmentation results of images of each video frame in the video are obtained in advance. The semantic segmentation result of the previous frame of image may include different semantics obtained by segmenting the previous frame of image according to the semantics and corresponding regions corresponding to the different semantics.
The semantic segmentation is a basic common algorithm in short video application or photographic picture editing application, such as human body matting, hair segmentation, scene segmentation and the like, provides accurate regions of human bodies, hairs, objects and the like, and provides region information for background replacement or special effect production through the semantic segmentation.
In step 102, performing boundary blurring processing on the semantic segmentation result of the previous video frame to obtain a blurred boundary image.
The boundary blurring processing may be performed on the semantic segmentation result of the previous video frame, and may be performed on the division boundaries of corresponding regions corresponding to a plurality of different semantics obtained by performing semantic segmentation on the previous frame of image, for example, expansion processing.
Optionally, performing boundary blurring processing on the semantic segmentation result of the previous video frame to obtain a blurred boundary image, including: and expanding the boundary of the semantic segmentation result of the previous video frame to a preset pixel width to obtain a blurred boundary image.
Optionally, the expanding the boundary of the semantic segmentation result of the previous video frame to a preset pixel width to obtain a blurred boundary image includes: the boundary of the semantic segmentation result of the previous video frame is inwards expanded by a preset pixel width to obtain a blurred boundary image; or, the boundary of the semantic segmentation result of the previous video frame is outwards expanded by a preset pixel width to obtain a blurred boundary image; or, the first pixel width is expanded inwards and the second pixel width is expanded outwards for the boundary of the semantic segmentation result of the previous video frame, so as to obtain a blurred boundary image, wherein the sum of the first pixel width and the second pixel width is a preset pixel width.
For example, the specified pixels are expanded inward and outward to the above-mentioned boundary, resulting in a gray stripe of the specified width. The designated pixel may be a designated pixel width considered to be set, or a pixel width estimated by a network model from a video frame to be processed, which is not limited in the present application.
In step 103, the video frame to be processed, the blurred boundary image, and the previous video frame are input into a pre-trained network model to obtain a semantic segmentation result of the video frame to be processed.
The video frame to be processed, the blurred boundary and the previous video frame are input into a pre-trained network model, the network model can be a network model in various forms, and the type of the model is not limited in the application.
Through the method, the video frame to be processed, the blurred boundary and the previous video frame are input into the pre-trained network model to obtain the semantic segmentation result of the video frame to be processed, the video frame to be processed can be subjected to semantic segmentation, and through the semantic segmentation result, human body matting, hair segmentation, scene segmentation and the like can be performed, so that regional information is provided for background replacement or special effect production.
Optionally, before the video frame to be processed, the blurred boundary image, and the previous video frame are input into a pre-trained network model to obtain a semantic segmentation result of the video frame to be processed, the method further includes: and performing data preprocessing on the video frame to be processed and the previous video frame according to the video frame to be processed, the blurred boundary and the previous video frame, wherein the difference between the video frame to be processed and the previous video frame after the data preprocessing is increased.
Optionally, the data preprocessing includes: one or more of displacement variation, scale variation, rotation variation, thin plate spline variation. The method of this pretreatment is not limited in this application.
In the video, the change of the images of the front video frame and the back video frame is often small, and the change of the video frames is not beneficial to learning the change of the video frames through a network model by directly inputting the video frames of the front video frame and the back video frame. And the change of the front frame and the back frame of the image can be conveniently obtained by the network model by increasing the change of the video frame and inputting the video frame into the network model.
Optionally, the training method of the pre-trained network model includes:
step A, obtaining semantic segmentation results of a previous frame image and a previous frame image of a video frame to be processed, wherein the video frame to be processed is marked with a standard semantic segmentation result;
b, performing boundary fuzzy processing on the semantic segmentation result of the previous frame of image to obtain a target fuzzy boundary image after fuzzy processing;
step C, inputting the target fuzzy boundary image, the previous frame image and the video frame to be processed into a network model to obtain a semantic segmentation result of the video frame to be processed;
step D, calculating the error between the semantic segmentation result of the video frame to be processed and the standard semantic segmentation result;
and E, judging whether the network model is converged or not according to the error, if the network model is converged, obtaining a trained network model, and if the network model is not converged, adjusting parameters of the network model to continue training the network model until the network model is converged.
Optionally, after the video frame to be processed, the blurred boundary image, and the previous video frame are input into a pre-trained network model to obtain a semantic segmentation result of the video frame to be processed, the method further includes: determining a foreground target and the position of the foreground target in the video frame to be processed according to the semantic segmentation result; and adding the foreground target to a corresponding position in a preset background image according to the position of the foreground target to obtain a replaced video frame.
Optionally, after the foreground object is added to a corresponding position in a preset background image according to the position of the foreground object and a replaced video frame is obtained, the method further includes: and generating the target video by using the plurality of replaced video frames.
The video frame image segmentation method provided by the embodiment of the application can guide the semantic segmentation process of the to-be-processed video frame through the previous frame image of the to-be-processed video frame, and can avoid the influence of the error of the semantic segmentation result of the previous frame image on the next array image by performing fuzzy processing on the boundary of the semantic segmentation result of the previous frame image, thereby improving the fault tolerance rate in the semantic segmentation process.
Fig. 2 is a flowchart illustrating a video frame image segmentation method according to an exemplary embodiment, where as shown in fig. 2, the video frame image segmentation method is applied to an intelligent terminal, and includes the following steps:
in step 101, a semantic segmentation result of a previous video frame and a previous video frame of a video frame to be processed is obtained.
The video frame to be processed may be any one of a plurality of types of videos, for example, a long video or a short video, and the video format of the video may be a plurality of types of video formats, which is not limited in this application.
In step 201, the boundary of the semantic segmentation result of the previous video frame is expanded to a preset pixel width, so as to obtain a blurred boundary image.
Optionally, the expanding the boundary of the semantic segmentation result of the previous video frame to a preset pixel width to obtain a blurred boundary image includes: the boundary of the semantic segmentation result of the previous video frame is inwards expanded by a preset pixel width to obtain a blurred boundary image; or, the boundary of the semantic segmentation result of the previous video frame is outwards expanded by a preset pixel width to obtain a blurred boundary image; or, the first pixel width is expanded inwards and the second pixel width is expanded outwards for the boundary of the semantic segmentation result of the previous video frame, so as to obtain a blurred boundary image, wherein the sum of the first pixel width and the second pixel width is a preset pixel width.
For example, enlarging the designated pixels inward and outward of the above boundary to obtain a gray stripe of the designated width. The designated pixel may be a designated pixel width considered to be set, or a pixel width estimated by a network model from a video frame to be processed, which is not limited in the present application.
The fuzzy boundary image with the preset width is obtained by expanding the preset pixel width inwards and outwards on the boundary of the semantic segmentation result of the previous video frame, and the influence of the segmentation result of the previous frame image on the next frame image can be reduced through the obtained boundary image with a larger width, so that the wrong accumulation is prevented.
In step 202, according to the video frame to be processed, the blurred boundary, and the previous video frame, data preprocessing is performed on the video frame to be processed and the previous video frame.
The difference between the video frame to be processed after data preprocessing and the previous video frame is increased.
Optionally, the data preprocessing includes: one or more of displacement variation, scale variation, rotation variation, thin plate spline variation. For example, the video frame to be processed is rotated by a preset angle, so as to increase the difference between the video frame to be processed and the previous video frame.
The variability between the video frame to be processed and the previous video frame can be increased through data preprocessing, and because the change of the images of the front video frame and the back video frame in the video is often small, the change of the video frames of the front video frame and the back video frame is not beneficial to learning the change of the video frames through a network model by directly inputting the video frames of the front video frame and the back video frame. And the change of the video frame is increased and then input into the network model, so that the network model can conveniently acquire the change of the front and rear frames of images, and the semantic segmentation process of the video frame to be processed by using the previous video frame is conveniently guided by using the network model. The robustness of the deep learning model to larger-range changes can be enhanced, the fault tolerance rate of the network model is improved, and the influence of errors of a previous frame on a next frame is avoided.
In step 103, the video frame to be processed, the blurred boundary image, and the previous video frame are input into a pre-trained network model to obtain a semantic segmentation result of the video frame to be processed.
Optionally, the training method of the pre-trained network model includes:
step A, obtaining semantic segmentation results of a previous frame image and a previous frame image of a video frame to be processed, wherein the video frame to be processed is marked with a standard semantic segmentation result;
b, performing boundary fuzzy processing on the semantic segmentation result of the previous frame of image to obtain a target fuzzy boundary image after fuzzy processing;
step C, inputting the target fuzzy boundary image, the previous frame image and the video frame to be processed into a network model to obtain a semantic segmentation result of the video frame to be processed;
step D, calculating the error between the semantic segmentation result of the video frame to be processed and the standard semantic segmentation result;
and E, judging whether the network model is converged or not according to the error, if the network model is converged, obtaining a trained network model, and if the network model is not converged, adjusting parameters of the network model to continue training the network model until the network model is converged.
Therefore, by the video frame image segmentation method provided by the embodiment of the application, the semantic segmentation process of the video frame to be processed can be guided through the previous frame image of the video frame to be processed, the network model identification precision can be improved by increasing the change of the previous frame image and the video frame image to be processed, the influence of the error of the semantic segmentation result of the previous frame image on the next array image can be avoided by performing fuzzy processing on the boundary of the semantic segmentation result of the previous frame image, and the fault tolerance rate in the semantic segmentation process is improved.
The embodiment of the present application further provides a method for training a network model, including:
step A, obtaining semantic segmentation results of a previous frame image and a previous frame image of a video frame to be processed, wherein the video frame to be processed is marked with a standard semantic segmentation result;
b, performing boundary fuzzy processing on the semantic segmentation result of the previous frame of image to obtain a target fuzzy boundary image after fuzzy processing;
step C, inputting the target fuzzy boundary image, the previous frame image and the video frame to be processed into a network model to obtain a semantic segmentation result of the video frame to be processed;
step D, calculating the error between the semantic segmentation result of the video frame to be processed and the standard semantic segmentation result;
and E, judging whether the network model is converged or not according to the error, if the network model is converged, obtaining a trained network model, and if the network model is not converged, adjusting parameters of the network model to continue training the network model until the network model is converged.
Optionally, before the video frame to be processed, the blurred boundary image, and the previous video frame are input into a pre-trained network model to obtain a semantic segmentation result of the video frame to be processed, the method further includes: and performing data preprocessing on the video frame to be processed and the previous video frame according to the video frame to be processed, the blurred boundary and the previous video frame, wherein the difference between the video frame to be processed and the previous video frame after the data preprocessing is increased.
Optionally, the data preprocessing includes: one or more of displacement variation, scale variation, rotation variation, thin plate spline variation.
By amplifying the picture changes of the two frames before and after, the robustness of the deep learning model to the wider range changes can be enhanced, the fault tolerance rate of the network model is improved, and the influence of the error of the previous frame on the next frame is avoided.
Optionally, performing boundary blurring processing on the semantic segmentation result of the previous video frame to obtain a blurred boundary image, including: and expanding the boundary of the semantic segmentation result of the previous video frame to a preset pixel width to obtain a blurred boundary image.
Optionally, the expanding the boundary of the semantic segmentation result of the previous video frame to a preset pixel width to obtain a blurred boundary image includes: the boundary of the semantic segmentation result of the previous video frame is inwards expanded by a preset pixel width to obtain a blurred boundary image; or, the boundary of the semantic segmentation result of the previous video frame is outwards expanded by a preset pixel width to obtain a blurred boundary image; or, the first pixel width is expanded inwards and the second pixel width is expanded outwards for the boundary of the semantic segmentation result of the previous video frame, so as to obtain a blurred boundary image, wherein the sum of the first pixel width and the second pixel width is a preset pixel width.
The boundary of the semantic segmentation result of the previous frame is subjected to expansion processing, the boundary is expanded inwards and outwards to obtain a fuzzy target boundary image with a preset width after fuzzy processing, and the influence of the semantic segmentation result of the previous frame on a target video frame can be reduced, so that the fault tolerance rate of the network model is improved, and the influence of errors of the previous frame on the next frame is avoided.
The network model for performing video frame semantic segmentation can be obtained through training by the training method of the network model of the embodiment of the application, and the network model can not only guide the semantic segmentation of the next frame by using the semantic segmentation result of the previous frame, but also improve the fault-tolerant rate of the network model and avoid the influence of the error of the previous frame on the next frame.
Fig. 3 is a block diagram illustrating a video frame image segmentation apparatus according to an example embodiment. Referring to fig. 3, the apparatus includes a result obtaining module 131, a fuzzy processing module 132, and a network model module 133.
The result obtaining module 131 is configured to obtain semantic segmentation results of a previous video frame and a previous video frame of the video frame to be processed.
The blurring processing module 132 is configured to perform boundary blurring processing on the semantic segmentation result of the previous video frame, so as to obtain a blurred boundary image.
The network model module 133 is configured to input the video frame to be processed, the blurred boundary image, and the previous video frame into a pre-trained network model, so as to obtain a semantic segmentation result of the video frame to be processed.
Optionally, the blur processing module 132 includes:
and the boundary expansion submodule is used for expanding the boundary of the semantic segmentation result of the previous video frame to a preset pixel width to obtain a blurred boundary image.
Optionally, the apparatus further comprises:
and the preprocessing module is used for preprocessing data of the video frame to be processed and the previous video frame according to the video frame to be processed, the blurred boundary and the previous video frame, wherein the difference between the video frame to be processed and the previous video frame after data preprocessing is increased.
Optionally, the data preprocessing includes: one or more of displacement variation, scale variation, rotation variation, thin plate spline variation.
Optionally, the step of training the network model in advance includes:
step A, obtaining semantic segmentation results of a previous frame image and a previous frame image of a video frame to be processed, wherein the video frame to be processed is marked with a standard semantic segmentation result;
b, performing boundary fuzzy processing on the semantic segmentation result of the previous frame of image to obtain a target fuzzy boundary image after fuzzy processing;
step C, inputting the target fuzzy boundary image, the previous frame image and the video frame to be processed into a network model to obtain a semantic segmentation result of the video frame to be processed;
step D, calculating the error between the semantic segmentation result of the video frame to be processed and the standard semantic segmentation result;
and E, judging whether the network model is converged or not according to the error, if the network model is converged, obtaining a trained network model, and if the network model is not converged, adjusting parameters of the network model to continue training the network model until the network model is converged.
The video frame image segmentation device provided by the embodiment of the application can guide the semantic segmentation process of the video frame to be processed through the previous frame image of the video frame to be processed, and can avoid the influence of the error of the semantic segmentation result of the previous frame image on the next array image by performing fuzzy processing on the boundary of the semantic segmentation result of the previous frame image, thereby improving the fault tolerance rate in the semantic segmentation process.
Fig. 4 is yet another block diagram illustrating a video frame image segmentation apparatus according to an example embodiment. Referring to fig. 3, the apparatus includes a result obtaining module 131, a boundary enlarging sub-module 141, a preprocessing module 142, and a network model module 133.
The result obtaining module 131 is configured to obtain semantic segmentation results of a previous video frame and a previous video frame of the video frame to be processed.
The boundary expansion submodule 141 is configured to expand the boundary of the semantic segmentation result of the previous video frame to a preset pixel width, resulting in a blurred boundary image.
The preprocessing module 142 is configured to perform data preprocessing on the video frame to be processed and the previous video frame according to the video frame to be processed, the blurred boundary, and the previous video frame.
The difference between the video frame to be processed after data preprocessing and the previous video frame is increased.
Optionally, the data preprocessing includes: one or more of displacement variation, scale variation, rotation variation, thin plate spline variation.
The network model module 133 is configured to input the video frame to be processed, the blurred boundary image, and the previous video frame into a pre-trained network model, so as to obtain a semantic segmentation result of the video frame to be processed.
Optionally, the step of training the network model in advance includes:
step A, obtaining semantic segmentation results of a previous frame image and a previous frame image of a video frame to be processed, wherein the video frame to be processed is marked with a standard semantic segmentation result;
b, performing boundary fuzzy processing on the semantic segmentation result of the previous frame of image to obtain a target fuzzy boundary image after fuzzy processing;
step C, inputting the target fuzzy boundary image, the previous frame image and the video frame to be processed into a network model to obtain a semantic segmentation result of the video frame to be processed;
step D, calculating the error between the semantic segmentation result of the video frame to be processed and the standard semantic segmentation result;
and E, judging whether the network model is converged or not according to the error, if the network model is converged, obtaining a trained network model, and if the network model is not converged, adjusting parameters of the network model to continue training the network model until the network model is converged.
The video frame image segmentation device provided by the embodiment of the application can guide the semantic segmentation process of the video frame to be processed through the previous frame image of the video frame to be processed, can improve the network model identification precision by increasing the change of the previous frame image and the video frame image to be processed, can avoid the influence of the error of the semantic segmentation result of the previous frame image on the next array image by carrying out fuzzy processing on the boundary of the semantic segmentation result of the previous frame image, and can improve the fault tolerance rate in the semantic segmentation process.
FIG. 5 is a schematic diagram of an electronic device shown in accordance with an exemplary embodiment.
In the exemplary embodiment, the electronic device further includes a communication interface 502 and a communication bus 504, wherein the processor 501, the communication interface 502, and the memory 503 are in communication with each other through the communication bus 504.
The processor is configured to implement any of the video frame image segmentation methods described above when executing the computer program stored in the memory.
FIG. 6 is a schematic diagram of a storage medium shown in accordance with an exemplary embodiment. For example, the apparatus 600 may be provided as a server. Referring to fig. 6, the apparatus 600 includes a processing component 622 that further includes one or more processors and memory resources, represented by memory 632, for storing instructions, such as applications, that are executable by the processing component 622. The application programs stored in memory 632 may include one or more modules that each correspond to a set of instructions. Further, the processing component 622 is configured to execute instructions to perform the video frame image segmentation method described above.
The apparatus 600 may also include a power component 626 configured to perform power management of the apparatus 600, a wired or wireless network interface 650 configured to connect the apparatus 600 to a network, and an input-output interface 658. The apparatus 600 may operate based on an operating system stored in the memory 632, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, or the like.
There is also provided, in accordance with an embodiment of the present disclosure, a computer program product, which, when executed by a computer, enables the computer to perform any one of the video frame image segmentation methods described above.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (10)

1. A method for segmenting video frame images is characterized by comprising the following steps:
obtaining a last video frame of a video frame to be processed and a semantic segmentation result of the last video frame;
performing boundary fuzzy processing on the semantic segmentation result of the previous video frame to obtain a fuzzy boundary image;
and inputting the video frame to be processed, the blurred boundary image and the previous video frame into a pre-trained network model to obtain a semantic segmentation result of the video frame to be processed.
2. The method of claim 1, wherein the training method of the pre-trained network model comprises:
step A, obtaining a previous frame image of a video frame to be processed and a semantic segmentation result of the previous frame image, wherein the video frame to be processed is marked with a standard semantic segmentation result;
b, performing boundary fuzzy processing on the semantic segmentation result of the previous frame of image to obtain a target fuzzy boundary image after fuzzy processing;
step C, inputting the target fuzzy boundary image, the previous frame image and the video frame to be processed into a network model to obtain a semantic segmentation result of the video frame to be processed;
step D, calculating the error between the semantic segmentation result of the video frame to be processed and the standard semantic segmentation result;
and E, judging whether the network model is converged or not according to the error, obtaining a trained network model if the network model is converged, and adjusting parameters of the network model to continue training the network model until the network model is converged if the network model is not converged.
3. The method according to claim 1, wherein the performing boundary blurring processing on the semantic segmentation result of the previous video frame to obtain a blurred boundary image comprises:
and expanding the boundary of the semantic segmentation result of the previous video frame to a preset pixel width to obtain a blurred boundary image.
4. The method according to claim 3, wherein the expanding the boundary of the semantic segmentation result of the previous video frame to a preset pixel width to obtain a blurred boundary image comprises:
expanding the preset pixel width inwards for the boundary of the semantic segmentation result of the previous video frame to obtain a blurred boundary image;
or, expanding the preset pixel width outwards for the boundary of the semantic segmentation result of the previous video frame to obtain a blurred boundary image;
or, expanding a first pixel width inwards and a second pixel width outwards on the boundary of the semantic segmentation result of the previous video frame to obtain a blurred boundary image, wherein the sum of the first pixel width and the second pixel width is the preset pixel width.
5. The method according to claim 1, wherein before the video frame to be processed, the blurred boundary image, and the previous video frame are input into a pre-trained network model to obtain the semantic segmentation result of the video frame to be processed, the method further comprises:
and performing data preprocessing on the video frame to be processed and the previous video frame according to the video frame to be processed, the blurred boundary and the previous video frame, wherein the difference between the video frame to be processed and the previous video frame after data preprocessing is increased.
6. The method of claim 5, wherein the data preprocessing comprises: one or more of displacement variation, scale variation, rotation variation, thin plate spline variation.
7. The method according to claim 1, wherein after the video frame to be processed, the blurred boundary image, and the previous video frame are input into a pre-trained network model to obtain a semantic segmentation result of the video frame to be processed, the method further comprises:
determining a foreground target in the video frame to be processed and the position of the foreground target according to the semantic segmentation result;
and adding the foreground target to a corresponding position in a preset background image according to the position of the foreground target to obtain a replaced video frame.
8. A video frame image segmentation apparatus, comprising:
the result acquisition module is used for acquiring a previous video frame of the video frames to be processed and a semantic segmentation result of the previous video frame;
the fuzzy processing module is used for carrying out boundary fuzzy processing on the semantic segmentation result of the previous video frame to obtain a fuzzy boundary image;
and the network model module is used for inputting the video frame to be processed, the blurred boundary image and the previous video frame into a pre-trained network model to obtain a semantic segmentation result of the video frame to be processed.
9. An electronic device, comprising:
a processor;
a memory for storing the processor-executable instructions;
wherein the processor is configured to execute the instructions to implement the video frame image segmentation method of any one of claims 1 to 7.
10. A storage medium having instructions that, when executed by a processor of an electronic device, enable the electronic device to perform a video frame image segmentation method according to any one of claims 1 to 7.
CN201911025928.4A 2019-10-25 2019-10-25 Video frame image segmentation method and device, electronic equipment and storage medium Pending CN110782469A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911025928.4A CN110782469A (en) 2019-10-25 2019-10-25 Video frame image segmentation method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911025928.4A CN110782469A (en) 2019-10-25 2019-10-25 Video frame image segmentation method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN110782469A true CN110782469A (en) 2020-02-11

Family

ID=69386618

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911025928.4A Pending CN110782469A (en) 2019-10-25 2019-10-25 Video frame image segmentation method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN110782469A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112613516A (en) * 2020-12-11 2021-04-06 北京影谱科技股份有限公司 Semantic segmentation method for aerial video data
CN112651880A (en) * 2020-12-25 2021-04-13 北京市商汤科技开发有限公司 Video data processing method and device, electronic equipment and storage medium
CN112800850A (en) * 2020-12-31 2021-05-14 上海商汤智能科技有限公司 Video processing method and device, electronic equipment and storage medium
CN112866797A (en) * 2020-12-31 2021-05-28 上海商汤智能科技有限公司 Video processing method and device, electronic equipment and storage medium
CN113313788A (en) * 2020-02-26 2021-08-27 北京小米移动软件有限公司 Image processing method and apparatus, electronic device, and computer-readable storage medium
CN113706555A (en) * 2021-08-12 2021-11-26 北京达佳互联信息技术有限公司 Video frame processing method and device, electronic equipment and storage medium
CN114187451A (en) * 2021-12-17 2022-03-15 广州文远知行科技有限公司 Semantic segmentation method, device and equipment and readable storage medium

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000018128A1 (en) * 1998-09-24 2000-03-30 The Trustees Of Columbia University In The City Of New York System and method for semantic video object segmentation
US6731799B1 (en) * 2000-06-01 2004-05-04 University Of Washington Object segmentation with background extraction and moving boundary techniques
US20090010546A1 (en) * 2005-12-30 2009-01-08 Telecom Italia S P.A. Edge-Guided Morphological Closing in Segmentation of Video Sequences
US20090196349A1 (en) * 2008-02-01 2009-08-06 Young-O Park Method for estimating contour of video object
CN108520223A (en) * 2018-04-02 2018-09-11 广州华多网络科技有限公司 Dividing method, segmenting device, storage medium and the terminal device of video image
CN108596940A (en) * 2018-04-12 2018-09-28 北京京东尚科信息技术有限公司 A kind of methods of video segmentation and device
CN109685060A (en) * 2018-11-09 2019-04-26 科大讯飞股份有限公司 Image processing method and device
CN109697724A (en) * 2017-10-24 2019-04-30 北京京东尚科信息技术有限公司 Video Image Segmentation method and device, storage medium, electronic equipment
CN109886238A (en) * 2019-03-01 2019-06-14 湖北无垠智探科技发展有限公司 Unmanned plane Image Change Detection algorithm based on semantic segmentation
CN109902748A (en) * 2019-03-04 2019-06-18 中国计量大学 A kind of image, semantic dividing method based on the full convolutional neural networks of fusion of multi-layer information
CN110110682A (en) * 2019-05-14 2019-08-09 西安电子科技大学 The semantic stereo reconstruction method of remote sensing images

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000018128A1 (en) * 1998-09-24 2000-03-30 The Trustees Of Columbia University In The City Of New York System and method for semantic video object segmentation
US6731799B1 (en) * 2000-06-01 2004-05-04 University Of Washington Object segmentation with background extraction and moving boundary techniques
US20090010546A1 (en) * 2005-12-30 2009-01-08 Telecom Italia S P.A. Edge-Guided Morphological Closing in Segmentation of Video Sequences
US20090196349A1 (en) * 2008-02-01 2009-08-06 Young-O Park Method for estimating contour of video object
CN109697724A (en) * 2017-10-24 2019-04-30 北京京东尚科信息技术有限公司 Video Image Segmentation method and device, storage medium, electronic equipment
CN108520223A (en) * 2018-04-02 2018-09-11 广州华多网络科技有限公司 Dividing method, segmenting device, storage medium and the terminal device of video image
CN108596940A (en) * 2018-04-12 2018-09-28 北京京东尚科信息技术有限公司 A kind of methods of video segmentation and device
CN109685060A (en) * 2018-11-09 2019-04-26 科大讯飞股份有限公司 Image processing method and device
CN109886238A (en) * 2019-03-01 2019-06-14 湖北无垠智探科技发展有限公司 Unmanned plane Image Change Detection algorithm based on semantic segmentation
CN109902748A (en) * 2019-03-04 2019-06-18 中国计量大学 A kind of image, semantic dividing method based on the full convolutional neural networks of fusion of multi-layer information
CN110110682A (en) * 2019-05-14 2019-08-09 西安电子科技大学 The semantic stereo reconstruction method of remote sensing images

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
光电的一只菜鸟: "opencv图像处理学习(五十四)——边界填充copyMakeBorder", 《HTTPS://BLOG.CSDN.NET/QQ_35789421/ARTICLE/DETAILS/89404224》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113313788A (en) * 2020-02-26 2021-08-27 北京小米移动软件有限公司 Image processing method and apparatus, electronic device, and computer-readable storage medium
CN112613516A (en) * 2020-12-11 2021-04-06 北京影谱科技股份有限公司 Semantic segmentation method for aerial video data
CN112651880A (en) * 2020-12-25 2021-04-13 北京市商汤科技开发有限公司 Video data processing method and device, electronic equipment and storage medium
CN112800850A (en) * 2020-12-31 2021-05-14 上海商汤智能科技有限公司 Video processing method and device, electronic equipment and storage medium
CN112866797A (en) * 2020-12-31 2021-05-28 上海商汤智能科技有限公司 Video processing method and device, electronic equipment and storage medium
CN112800850B (en) * 2020-12-31 2024-09-17 上海商汤智能科技有限公司 Video processing method and device, electronic equipment and storage medium
CN113706555A (en) * 2021-08-12 2021-11-26 北京达佳互联信息技术有限公司 Video frame processing method and device, electronic equipment and storage medium
CN114187451A (en) * 2021-12-17 2022-03-15 广州文远知行科技有限公司 Semantic segmentation method, device and equipment and readable storage medium

Similar Documents

Publication Publication Date Title
CN110782469A (en) Video frame image segmentation method and device, electronic equipment and storage medium
US11200404B2 (en) Feature point positioning method, storage medium, and computer device
CN109598744B (en) Video tracking method, device, equipment and storage medium
CN108830780B (en) Image processing method and device, electronic device and storage medium
US20150279021A1 (en) Video object tracking in traffic monitoring
CN109145771B (en) Face snapshot method and device
US20240212161A1 (en) Foreground data generation method and method for applying same, related apparatus, and system
US20230252664A1 (en) Image Registration Method and Apparatus, Electronic Apparatus, and Storage Medium
CN110348393B (en) Vehicle feature extraction model training method, vehicle identification method and equipment
CN109447022B (en) Lens type identification method and device
CN115278089B (en) Face fuzzy image focusing correction method, device, equipment and storage medium
CN110473227A (en) Method for tracking target, device, equipment and storage medium
CN113112542A (en) Visual positioning method and device, electronic equipment and storage medium
US8428369B2 (en) Information processing apparatus, information processing method, and program
CN114155172A (en) Image processing method and system
CN113409353A (en) Motion foreground detection method and device, terminal equipment and storage medium
CN110580462B (en) Natural scene text detection method and system based on non-local network
CN112218005A (en) Video editing method based on artificial intelligence
CN117459661A (en) Video processing method, device, equipment and machine-readable storage medium
CN112084855A (en) Outlier elimination method for video stream based on improved RANSAC method
CN110826564A (en) Small target semantic segmentation method and system in complex scene image
US20200226763A1 (en) Object Detection Method and Computing System Thereof
CN116188535A (en) Video tracking method, device, equipment and storage medium based on optical flow estimation
CN111815689B (en) Semi-automatic labeling method, equipment, medium and device
US11816842B2 (en) Image processing method, apparatus, electronic device, and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200211