CN110782469A

CN110782469A - Video frame image segmentation method and device, electronic equipment and storage medium

Info

Publication number: CN110782469A
Application number: CN201911025928.4A
Authority: CN
Inventors: 郭益林; 赵松涛; 宋丛礼
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2019-10-25
Filing date: 2019-10-25
Publication date: 2020-02-11

Abstract

The application relates to a video frame image segmentation method, a video frame image segmentation device, electronic equipment and a storage medium, wherein the method comprises the following steps: obtaining semantic segmentation results of a previous video frame and a previous video frame of a video frame to be processed; performing boundary fuzzy processing on the semantic segmentation result of the previous video frame to obtain a fuzzy boundary image; and inputting the video frame to be processed, the blurred boundary image and the previous video frame into a pre-trained network model to obtain a semantic segmentation result of the video frame to be processed. According to the technical scheme of the application, the semantic segmentation process of the video frame to be processed is guided through the previous frame image of the video frame, the boundary of the semantic segmentation result of the previous frame image is subjected to fuzzy processing, the influence of errors of the semantic segmentation result of the previous frame image on the next array image can be avoided, and the accuracy of the semantic segmentation process is improved.

Description

Video frame image segmentation method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a method and an apparatus for segmenting a video frame image, an electronic device, and a storage medium.

Background

Currently, semantic segmentation is widely applied to short video editing and image editing as a common algorithm in short video application and photographic image editing application. For example, human body matting, hair segmentation, scene segmentation and the like can be performed through semantic segmentation, and region information is provided for background replacement or special effect production. When the video frame image is subjected to voice segmentation, the semantic segmentation result of the previous frame image is used as the guidance of the semantic segmentation of the next frame image, so that the semantic segmentation precision is improved.

However, in the video image, since the image change of two consecutive frames is small, the result of semantic segmentation of the previous frame is directly used as a guidance for semantic segmentation of the next frame, and there may be a case that the semantic segmentation of the previous frame is incorrect or has an error, so that the semantic segmentation of the next frame may be misguided, resulting in delay of semantic segmentation and accumulation of errors.

Disclosure of Invention

An object of the embodiments of the present application is to provide a method and an apparatus for segmenting a video frame image, an electronic device, and a storage medium, so as to solve how to solve the problem that the segmentation of a previous frame image is mistaken for the speech segmentation of a next frame image. The specific technical scheme is as follows:

according to a first aspect of the embodiments of the present disclosure, there is provided a video frame image segmentation method, including:

obtaining semantic segmentation results of a previous video frame and a previous video frame of a video frame to be processed;

performing boundary fuzzy processing on the semantic segmentation result of the previous video frame to obtain a fuzzy boundary image;

and inputting the video frame to be processed, the blurred boundary image and the previous video frame into a pre-trained network model to obtain a semantic segmentation result of the video frame to be processed.

Optionally, the training method of the pre-trained network model includes:

step A, obtaining semantic segmentation results of a previous frame image and a previous frame image of a video frame to be processed, wherein the video frame to be processed is marked with a standard semantic segmentation result;

b, performing boundary fuzzy processing on the semantic segmentation result of the previous frame of image to obtain a target fuzzy boundary image after fuzzy processing;

step C, inputting the target fuzzy boundary image, the previous frame image and the video frame to be processed into a network model to obtain a semantic segmentation result of the video frame to be processed;

step D, calculating the error between the semantic segmentation result of the video frame to be processed and the standard semantic segmentation result;

and E, judging whether the network model is converged or not according to the error, if the network model is converged, obtaining a trained network model, and if the network model is not converged, adjusting parameters of the network model to continue training the network model until the network model is converged.

Optionally, performing boundary blurring processing on the semantic segmentation result of the previous video frame to obtain a blurred boundary image, including:

and expanding the boundary of the semantic segmentation result of the previous video frame to a preset pixel width to obtain a blurred boundary image.

Optionally, the expanding the boundary of the semantic segmentation result of the previous video frame to a preset pixel width to obtain a blurred boundary image includes:

the boundary of the semantic segmentation result of the previous video frame is inwards expanded by a preset pixel width to obtain a blurred boundary image;

or, the boundary of the semantic segmentation result of the previous video frame is outwards expanded by a preset pixel width to obtain a blurred boundary image;

or, the first pixel width is expanded inwards and the second pixel width is expanded outwards for the boundary of the semantic segmentation result of the previous video frame, so as to obtain a blurred boundary image, wherein the sum of the first pixel width and the second pixel width is a preset pixel width.

Optionally, before the video frame to be processed, the blurred boundary image, and the previous video frame are input into a pre-trained network model to obtain a semantic segmentation result of the video frame to be processed, the method further includes:

and performing data preprocessing on the video frame to be processed and the previous video frame according to the video frame to be processed, the blurred boundary and the previous video frame, wherein the difference between the video frame to be processed and the previous video frame after the data preprocessing is increased.

Optionally, the data preprocessing includes: one or more of displacement variation, scale variation, rotation variation, thin plate spline variation.

Optionally, after the video frame to be processed, the blurred boundary image, and the previous video frame are input into a pre-trained network model to obtain a semantic segmentation result of the video frame to be processed, the method further includes:

determining a foreground target and the position of the foreground target in the video frame to be processed according to the semantic segmentation result;

and adding the foreground target to a corresponding position in a preset background image according to the position of the foreground target to obtain a replaced video frame.

Optionally, after the foreground object is added to a corresponding position in a preset background image according to the position of the foreground object and a replaced video frame is obtained, the method further includes:

and generating the target video by using the plurality of replaced video frames.

According to a second aspect of the embodiments of the present disclosure, there is provided a video frame image segmentation apparatus including:

the result acquisition module is used for acquiring the semantic segmentation result of the last video frame and the last video frame of the video frames to be processed;

the fuzzy processing module is used for carrying out boundary fuzzy processing on the semantic segmentation result of the previous video frame to obtain a fuzzy boundary image;

and the network model module is used for inputting the video frame to be processed, the blurred boundary image and the previous video frame into a pre-trained network model to obtain a semantic segmentation result of the video frame to be processed.

Optionally, the training method of the pre-trained network model includes:

Optionally, the fuzzy processing module includes:

and the boundary expansion submodule is used for expanding the boundary of the semantic segmentation result of the previous video frame to a preset pixel width to obtain a blurred boundary image.

Optionally, the boundary enlarging submodule includes:

the inward expansion unit is used for expanding the preset pixel width inwards on the boundary of the semantic segmentation result of the previous video frame to obtain a blurred boundary image;

the outward expansion unit is used for expanding the preset pixel width outward on the boundary of the semantic segmentation result of the previous video frame to obtain a blurred boundary image;

and the internal and external expanding unit is used for expanding the first pixel width inwards and expanding the second pixel width outwards on the boundary of the semantic segmentation result of the previous video frame to obtain a blurred boundary image, wherein the sum of the first pixel width and the second pixel width is a preset pixel width.

Optionally, the apparatus further comprises:

and the preprocessing module is used for preprocessing data of the video frame to be processed and the previous video frame according to the video frame to be processed, the blurred boundary and the previous video frame, wherein the difference between the video frame to be processed and the previous video frame after data preprocessing is increased.

Optionally, the apparatus further comprises:

the video replacement module is used for determining a foreground target and the position of the foreground target in the video frame to be processed according to the semantic segmentation result;

and the background adding module is used for adding the foreground target to the corresponding position in the preset background image according to the position of the foreground target to obtain the replaced video frame.

Optionally, the apparatus further comprises:

and the video generation module is used for generating the target video by utilizing the plurality of replaced video frames.

According to a third aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to execute the instructions to implement any of the video frame image segmentation methods described above.

According to a fourth aspect of embodiments of the present disclosure, there is provided a storage medium,

the instructions in the storage medium, when executed by a processor of the electronic device, enable the electronic device to perform any of the video frame image segmentation methods described above.

According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product which, when executed by a computer, enables the computer to perform any one of the video frame image segmentation methods described above.

The embodiment of the application provides a video frame image segmentation method, a video frame image segmentation device, electronic equipment and a storage medium, wherein the method comprises the following steps: obtaining semantic segmentation results of a previous video frame and a previous video frame of a video frame to be processed; performing boundary fuzzy processing on the semantic segmentation result of the previous video frame to obtain a fuzzy boundary image; and inputting the video frame to be processed, the blurred boundary image and the previous video frame into a pre-trained network model to obtain a semantic segmentation result of the video frame to be processed. According to the technical scheme of the application, the semantic segmentation process of the video frame to be processed is guided through the previous frame image of the video frame, the boundary of the semantic segmentation result of the previous frame image is subjected to fuzzy processing, the influence of errors of the semantic segmentation result of the previous frame image on the next array image can be avoided, and the accuracy of the semantic segmentation process is improved. Of course, not all advantages described above need to be achieved at the same time in the practice of any one product or method of the present application.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a flow diagram illustrating a method of video frame image segmentation in accordance with an exemplary embodiment;

FIG. 2 is yet another flow diagram illustrating a method of video frame image segmentation in accordance with an exemplary embodiment;

FIG. 3 is a block diagram illustrating a video frame image segmentation apparatus according to an exemplary embodiment;

FIG. 4 is yet another block diagram illustrating a video frame image segmentation apparatus in accordance with an exemplary embodiment;

FIG. 5 is a schematic diagram of an electronic device shown in accordance with an exemplary embodiment;

FIG. 6 is a schematic diagram of a storage medium shown in accordance with an exemplary embodiment.

Detailed Description

In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

The video frame image segmentation method of the embodiment of the disclosure is directed to a video in an intelligent terminal device, and therefore can be executed through the intelligent terminal device, and specifically, the intelligent terminal device can be a mobile phone, a computer or a server.

Semantic segmentation: refers to a deep learning algorithm that associates a label or category with each pixel of a picture to identify the set of pixels that constitute a distinguishable category.

Fig. 1 is a flowchart illustrating a video frame image segmentation method according to an exemplary embodiment, where as shown in fig. 1, the video frame image segmentation method is applied to an intelligent terminal, and includes the following steps:

in step 101, a semantic segmentation result of a previous video frame and a previous video frame of a video frame to be processed is obtained.

The video frame to be processed may be a video frame in a video, where semantic segmentation results of images of each video frame in the video are obtained in advance. The semantic segmentation result of the previous frame of image may include different semantics obtained by segmenting the previous frame of image according to the semantics and corresponding regions corresponding to the different semantics.

The semantic segmentation is a basic common algorithm in short video application or photographic picture editing application, such as human body matting, hair segmentation, scene segmentation and the like, provides accurate regions of human bodies, hairs, objects and the like, and provides region information for background replacement or special effect production through the semantic segmentation.

In step 102, performing boundary blurring processing on the semantic segmentation result of the previous video frame to obtain a blurred boundary image.

The boundary blurring processing may be performed on the semantic segmentation result of the previous video frame, and may be performed on the division boundaries of corresponding regions corresponding to a plurality of different semantics obtained by performing semantic segmentation on the previous frame of image, for example, expansion processing.

Optionally, performing boundary blurring processing on the semantic segmentation result of the previous video frame to obtain a blurred boundary image, including: and expanding the boundary of the semantic segmentation result of the previous video frame to a preset pixel width to obtain a blurred boundary image.

Optionally, the expanding the boundary of the semantic segmentation result of the previous video frame to a preset pixel width to obtain a blurred boundary image includes: the boundary of the semantic segmentation result of the previous video frame is inwards expanded by a preset pixel width to obtain a blurred boundary image; or, the boundary of the semantic segmentation result of the previous video frame is outwards expanded by a preset pixel width to obtain a blurred boundary image; or, the first pixel width is expanded inwards and the second pixel width is expanded outwards for the boundary of the semantic segmentation result of the previous video frame, so as to obtain a blurred boundary image, wherein the sum of the first pixel width and the second pixel width is a preset pixel width.

For example, the specified pixels are expanded inward and outward to the above-mentioned boundary, resulting in a gray stripe of the specified width. The designated pixel may be a designated pixel width considered to be set, or a pixel width estimated by a network model from a video frame to be processed, which is not limited in the present application.

In step 103, the video frame to be processed, the blurred boundary image, and the previous video frame are input into a pre-trained network model to obtain a semantic segmentation result of the video frame to be processed.

The video frame to be processed, the blurred boundary and the previous video frame are input into a pre-trained network model, the network model can be a network model in various forms, and the type of the model is not limited in the application.

Through the method, the video frame to be processed, the blurred boundary and the previous video frame are input into the pre-trained network model to obtain the semantic segmentation result of the video frame to be processed, the video frame to be processed can be subjected to semantic segmentation, and through the semantic segmentation result, human body matting, hair segmentation, scene segmentation and the like can be performed, so that regional information is provided for background replacement or special effect production.

Optionally, before the video frame to be processed, the blurred boundary image, and the previous video frame are input into a pre-trained network model to obtain a semantic segmentation result of the video frame to be processed, the method further includes: and performing data preprocessing on the video frame to be processed and the previous video frame according to the video frame to be processed, the blurred boundary and the previous video frame, wherein the difference between the video frame to be processed and the previous video frame after the data preprocessing is increased.

Optionally, the data preprocessing includes: one or more of displacement variation, scale variation, rotation variation, thin plate spline variation. The method of this pretreatment is not limited in this application.

In the video, the change of the images of the front video frame and the back video frame is often small, and the change of the video frames is not beneficial to learning the change of the video frames through a network model by directly inputting the video frames of the front video frame and the back video frame. And the change of the front frame and the back frame of the image can be conveniently obtained by the network model by increasing the change of the video frame and inputting the video frame into the network model.

Optionally, the training method of the pre-trained network model includes:

Optionally, after the video frame to be processed, the blurred boundary image, and the previous video frame are input into a pre-trained network model to obtain a semantic segmentation result of the video frame to be processed, the method further includes: determining a foreground target and the position of the foreground target in the video frame to be processed according to the semantic segmentation result; and adding the foreground target to a corresponding position in a preset background image according to the position of the foreground target to obtain a replaced video frame.

Optionally, after the foreground object is added to a corresponding position in a preset background image according to the position of the foreground object and a replaced video frame is obtained, the method further includes: and generating the target video by using the plurality of replaced video frames.

The video frame image segmentation method provided by the embodiment of the application can guide the semantic segmentation process of the to-be-processed video frame through the previous frame image of the to-be-processed video frame, and can avoid the influence of the error of the semantic segmentation result of the previous frame image on the next array image by performing fuzzy processing on the boundary of the semantic segmentation result of the previous frame image, thereby improving the fault tolerance rate in the semantic segmentation process.

Fig. 2 is a flowchart illustrating a video frame image segmentation method according to an exemplary embodiment, where as shown in fig. 2, the video frame image segmentation method is applied to an intelligent terminal, and includes the following steps:

The video frame to be processed may be any one of a plurality of types of videos, for example, a long video or a short video, and the video format of the video may be a plurality of types of video formats, which is not limited in this application.

In step 201, the boundary of the semantic segmentation result of the previous video frame is expanded to a preset pixel width, so as to obtain a blurred boundary image.

For example, enlarging the designated pixels inward and outward of the above boundary to obtain a gray stripe of the designated width. The designated pixel may be a designated pixel width considered to be set, or a pixel width estimated by a network model from a video frame to be processed, which is not limited in the present application.

The fuzzy boundary image with the preset width is obtained by expanding the preset pixel width inwards and outwards on the boundary of the semantic segmentation result of the previous video frame, and the influence of the segmentation result of the previous frame image on the next frame image can be reduced through the obtained boundary image with a larger width, so that the wrong accumulation is prevented.

In step 202, according to the video frame to be processed, the blurred boundary, and the previous video frame, data preprocessing is performed on the video frame to be processed and the previous video frame.

The difference between the video frame to be processed after data preprocessing and the previous video frame is increased.

Optionally, the data preprocessing includes: one or more of displacement variation, scale variation, rotation variation, thin plate spline variation. For example, the video frame to be processed is rotated by a preset angle, so as to increase the difference between the video frame to be processed and the previous video frame.

The variability between the video frame to be processed and the previous video frame can be increased through data preprocessing, and because the change of the images of the front video frame and the back video frame in the video is often small, the change of the video frames of the front video frame and the back video frame is not beneficial to learning the change of the video frames through a network model by directly inputting the video frames of the front video frame and the back video frame. And the change of the video frame is increased and then input into the network model, so that the network model can conveniently acquire the change of the front and rear frames of images, and the semantic segmentation process of the video frame to be processed by using the previous video frame is conveniently guided by using the network model. The robustness of the deep learning model to larger-range changes can be enhanced, the fault tolerance rate of the network model is improved, and the influence of errors of a previous frame on a next frame is avoided.

Optionally, the training method of the pre-trained network model includes:

Therefore, by the video frame image segmentation method provided by the embodiment of the application, the semantic segmentation process of the video frame to be processed can be guided through the previous frame image of the video frame to be processed, the network model identification precision can be improved by increasing the change of the previous frame image and the video frame image to be processed, the influence of the error of the semantic segmentation result of the previous frame image on the next array image can be avoided by performing fuzzy processing on the boundary of the semantic segmentation result of the previous frame image, and the fault tolerance rate in the semantic segmentation process is improved.

The embodiment of the present application further provides a method for training a network model, including:

By amplifying the picture changes of the two frames before and after, the robustness of the deep learning model to the wider range changes can be enhanced, the fault tolerance rate of the network model is improved, and the influence of the error of the previous frame on the next frame is avoided.

The boundary of the semantic segmentation result of the previous frame is subjected to expansion processing, the boundary is expanded inwards and outwards to obtain a fuzzy target boundary image with a preset width after fuzzy processing, and the influence of the semantic segmentation result of the previous frame on a target video frame can be reduced, so that the fault tolerance rate of the network model is improved, and the influence of errors of the previous frame on the next frame is avoided.

The network model for performing video frame semantic segmentation can be obtained through training by the training method of the network model of the embodiment of the application, and the network model can not only guide the semantic segmentation of the next frame by using the semantic segmentation result of the previous frame, but also improve the fault-tolerant rate of the network model and avoid the influence of the error of the previous frame on the next frame.

Fig. 3 is a block diagram illustrating a video frame image segmentation apparatus according to an example embodiment. Referring to fig. 3, the apparatus includes a result obtaining module 131, a fuzzy processing module 132, and a network model module 133.

The result obtaining module 131 is configured to obtain semantic segmentation results of a previous video frame and a previous video frame of the video frame to be processed.

The blurring processing module 132 is configured to perform boundary blurring processing on the semantic segmentation result of the previous video frame, so as to obtain a blurred boundary image.

The network model module 133 is configured to input the video frame to be processed, the blurred boundary image, and the previous video frame into a pre-trained network model, so as to obtain a semantic segmentation result of the video frame to be processed.

Optionally, the blur processing module 132 includes:

Optionally, the apparatus further comprises:

Optionally, the step of training the network model in advance includes:

The video frame image segmentation device provided by the embodiment of the application can guide the semantic segmentation process of the video frame to be processed through the previous frame image of the video frame to be processed, and can avoid the influence of the error of the semantic segmentation result of the previous frame image on the next array image by performing fuzzy processing on the boundary of the semantic segmentation result of the previous frame image, thereby improving the fault tolerance rate in the semantic segmentation process.

Fig. 4 is yet another block diagram illustrating a video frame image segmentation apparatus according to an example embodiment. Referring to fig. 3, the apparatus includes a result obtaining module 131, a boundary enlarging sub-module 141, a preprocessing module 142, and a network model module 133.

The boundary expansion submodule 141 is configured to expand the boundary of the semantic segmentation result of the previous video frame to a preset pixel width, resulting in a blurred boundary image.

The preprocessing module 142 is configured to perform data preprocessing on the video frame to be processed and the previous video frame according to the video frame to be processed, the blurred boundary, and the previous video frame.

Optionally, the step of training the network model in advance includes:

The video frame image segmentation device provided by the embodiment of the application can guide the semantic segmentation process of the video frame to be processed through the previous frame image of the video frame to be processed, can improve the network model identification precision by increasing the change of the previous frame image and the video frame image to be processed, can avoid the influence of the error of the semantic segmentation result of the previous frame image on the next array image by carrying out fuzzy processing on the boundary of the semantic segmentation result of the previous frame image, and can improve the fault tolerance rate in the semantic segmentation process.

FIG. 5 is a schematic diagram of an electronic device shown in accordance with an exemplary embodiment.

In the exemplary embodiment, the electronic device further includes a communication interface 502 and a communication bus 504, wherein the processor 501, the communication interface 502, and the memory 503 are in communication with each other through the communication bus 504.

The processor is configured to implement any of the video frame image segmentation methods described above when executing the computer program stored in the memory.

FIG. 6 is a schematic diagram of a storage medium shown in accordance with an exemplary embodiment. For example, the apparatus 600 may be provided as a server. Referring to fig. 6, the apparatus 600 includes a processing component 622 that further includes one or more processors and memory resources, represented by memory 632, for storing instructions, such as applications, that are executable by the processing component 622. The application programs stored in memory 632 may include one or more modules that each correspond to a set of instructions. Further, the processing component 622 is configured to execute instructions to perform the video frame image segmentation method described above.

The apparatus 600 may also include a power component 626 configured to perform power management of the apparatus 600, a wired or wireless network interface 650 configured to connect the apparatus 600 to a network, and an input-output interface 658. The apparatus 600 may operate based on an operating system stored in the memory 632, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, or the like.

There is also provided, in accordance with an embodiment of the present disclosure, a computer program product, which, when executed by a computer, enables the computer to perform any one of the video frame image segmentation methods described above.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A method for segmenting video frame images is characterized by comprising the following steps:

obtaining a last video frame of a video frame to be processed and a semantic segmentation result of the last video frame;

2. The method of claim 1, wherein the training method of the pre-trained network model comprises:

step A, obtaining a previous frame image of a video frame to be processed and a semantic segmentation result of the previous frame image, wherein the video frame to be processed is marked with a standard semantic segmentation result;

and E, judging whether the network model is converged or not according to the error, obtaining a trained network model if the network model is converged, and adjusting parameters of the network model to continue training the network model until the network model is converged if the network model is not converged.

3. The method according to claim 1, wherein the performing boundary blurring processing on the semantic segmentation result of the previous video frame to obtain a blurred boundary image comprises:

4. The method according to claim 3, wherein the expanding the boundary of the semantic segmentation result of the previous video frame to a preset pixel width to obtain a blurred boundary image comprises:

expanding the preset pixel width inwards for the boundary of the semantic segmentation result of the previous video frame to obtain a blurred boundary image;

or, expanding the preset pixel width outwards for the boundary of the semantic segmentation result of the previous video frame to obtain a blurred boundary image;

or, expanding a first pixel width inwards and a second pixel width outwards on the boundary of the semantic segmentation result of the previous video frame to obtain a blurred boundary image, wherein the sum of the first pixel width and the second pixel width is the preset pixel width.

5. The method according to claim 1, wherein before the video frame to be processed, the blurred boundary image, and the previous video frame are input into a pre-trained network model to obtain the semantic segmentation result of the video frame to be processed, the method further comprises:

and performing data preprocessing on the video frame to be processed and the previous video frame according to the video frame to be processed, the blurred boundary and the previous video frame, wherein the difference between the video frame to be processed and the previous video frame after data preprocessing is increased.

6. The method of claim 5, wherein the data preprocessing comprises: one or more of displacement variation, scale variation, rotation variation, thin plate spline variation.

7. The method according to claim 1, wherein after the video frame to be processed, the blurred boundary image, and the previous video frame are input into a pre-trained network model to obtain a semantic segmentation result of the video frame to be processed, the method further comprises:

determining a foreground target in the video frame to be processed and the position of the foreground target according to the semantic segmentation result;

8. A video frame image segmentation apparatus, comprising:

the result acquisition module is used for acquiring a previous video frame of the video frames to be processed and a semantic segmentation result of the previous video frame;

9. An electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the video frame image segmentation method of any one of claims 1 to 7.

10. A storage medium having instructions that, when executed by a processor of an electronic device, enable the electronic device to perform a video frame image segmentation method according to any one of claims 1 to 7.