CN116912283A

CN116912283A - Video segmentation method and device based on cascade residual convolution neural network

Info

Publication number: CN116912283A
Application number: CN202310946494.1A
Authority: CN
Inventors: 林师言; 高航; 高福星; 祝洁
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Priority date: 2023-07-28
Filing date: 2023-07-28
Publication date: 2023-10-20

Abstract

The embodiment of the application provides a video segmentation method and a device based on a cascade residual convolution neural network, which can be used in the technical field of artificial intelligence, and the method comprises the following steps: obtaining a video frame to be segmented; the method comprises the steps that video segmentation processing is carried out on a video frame to be segmented through a pre-constructed cascade residual convolutional neural network model, a target segmented video frame is obtained, the cascade residual convolutional neural network model comprises an improved background convolutional neural network model and an improved segmentation convolutional neural network model, video segmentation is carried out through the improved background convolutional neural network model and the improved segmentation convolutional neural network model, under the condition of environmental fluctuation, video segmentation effect can be guaranteed, errors of a segmentation area are reduced, and segmentation precision is improved; in the segmentation process, the resource consumption is reduced, the video segmentation time is reduced, the segmentation efficiency is improved, and the requirement on hardware equipment is reduced.

Description

Video segmentation method and device based on cascade residual convolution neural network

Technical Field

The application relates to the technical field of computers, in particular to the technical field of artificial intelligence, and particularly relates to a video segmentation method and device based on a cascade residual convolution neural network.

Background

Currently, video segmentation techniques are often used to highlight portions of video images that are used in areas of interest, where such portions are often associated with moving objects, such as tracking pedestrians under surveillance, capturing faces in real time in front of a camera, etc. In the related art, a video frame is generally subjected to binary segmentation by a segmentation algorithm based on a gray threshold, but if the environment background in the video suddenly changes, a subtle dynamic background motion occurs or camouflage effect exists, the video segmentation effect can be caused to have larger fluctuation, and finally a segmentation area has larger error and the segmentation precision is low; more resource consumption is occupied in the segmentation process, so that the video segmentation time is longer, the segmentation efficiency is lower, and the requirement on hardware equipment is higher.

Disclosure of Invention

The application aims to provide a video segmentation method based on a cascade residual convolutional neural network, which is used for carrying out video segmentation through an improved background convolutional neural network model and an improved segmentation convolutional neural network model, so that the video segmentation effect can be ensured, the error of a segmentation area can be reduced, and the segmentation precision can be improved under the condition of environmental variation; in the segmentation process, the resource consumption is reduced, the video segmentation time is reduced, the segmentation efficiency is improved, and the requirement on hardware equipment is reduced. Another object of the present application is to provide a video segmentation apparatus based on a cascade residual convolutional neural network. It is yet another object of the present application to provide a computer readable medium. It is a further object of the application to provide a computer device.

In order to achieve the above object, an aspect of the present application discloses a video segmentation method based on a cascade residual convolution neural network, comprising:

obtaining a video frame to be segmented;

and carrying out video segmentation processing on the video frame to be segmented through a pre-constructed cascade residual convolutional neural network model to obtain a target segmented video frame, wherein the cascade residual convolutional neural network model comprises an improved background convolutional neural network model and an improved segmented convolutional neural network model.

Preferably, before the video segmentation processing is performed on the video frame to be segmented through the pre-constructed cascade residual convolution neural network model to obtain the target segmented video frame, the method further comprises the steps of:

and if the video frame to be segmented comprises a color video frame, carrying out gray processing on the color video frame to obtain the video frame to be segmented after gray processing.

Preferably, the video segmentation processing is performed on the video frame to be segmented through a pre-constructed cascade residual convolution neural network model, so as to obtain a target segmented video frame, which comprises the following steps:

extracting the background of the video frame to be segmented to obtain a deterministic background image;

performing residual mapping on the video frames to be segmented according to the deterministic background image by using an improved background convolutional neural network model to obtain a residual mapping result;

and performing video segmentation on the video frame to be segmented and the residual mapping result through an improved segmentation convolutional neural network model to generate a target segmentation video frame.

Preferably, before residual mapping is performed on the video frame to be segmented according to the deterministic background image by using the improved background convolutional neural network model to obtain a residual mapping result, the method further comprises:

and carrying out gray scale normalization on the video frames to be segmented to obtain normalized video frames to be segmented.

Preferably, the improved background convolutional neural network model comprises a plurality of convolutional layers, at least two of which are connected by a residual error.

Preferably, the improved segmented convolutional neural network model comprises a plurality of convolutional layers, at least two of which are connected by a residual.

Preferably, the method further comprises:

acquiring a training video frame data set, wherein the training video frame data set comprises training video frame data and corresponding binary image labels;

and training the cascade residual convolution neural network based on the training video frame data and the corresponding binary image label to obtain a cascade residual convolution neural network based model.

Preferably, the object segmentation video frame comprises a region of interest;

after video segmentation processing is carried out on the video frames to be segmented through a pre-constructed cascade residual convolution neural network model, the method further comprises the following steps:

the region of interest is marked by a preset marking means comprising one or any combination of highlighting, box marking and underlining.

The application also discloses a video segmentation device based on the cascade residual convolution neural network, which comprises:

the first acquisition unit is used for acquiring the video frames to be segmented;

the video segmentation unit is used for carrying out video segmentation processing on the video frames to be segmented through a pre-constructed cascade residual convolutional neural network model to obtain target segmented video frames, wherein the cascade residual convolutional neural network model comprises an improved background convolutional neural network model and an improved segmentation convolutional neural network model.

The application also discloses a computer readable medium having stored thereon a computer program which when executed by a processor implements a method as described above.

The application also discloses a computer device comprising a memory for storing information comprising program instructions and a processor for controlling the execution of the program instructions, the processor implementing the method as described above when executing the program.

The application also discloses a computer program product comprising a computer program/instruction which, when executed by a processor, implements a method as described above.

The method comprises the steps of obtaining a video frame to be segmented; the method comprises the steps that video segmentation processing is carried out on a video frame to be segmented through a pre-constructed cascade residual convolutional neural network model, a target segmented video frame is obtained, the cascade residual convolutional neural network model comprises an improved background convolutional neural network model and an improved segmentation convolutional neural network model, video segmentation is carried out through the improved background convolutional neural network model and the improved segmentation convolutional neural network model, under the condition of environmental fluctuation, video segmentation effect can be guaranteed, errors of a segmentation area are reduced, and segmentation precision is improved; in the segmentation process, the resource consumption is reduced, the video segmentation time is reduced, the segmentation efficiency is improved, and the requirement on hardware equipment is reduced.

Drawings

In order to more clearly illustrate the embodiments of the application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of a video segmentation method based on a cascade residual convolutional neural network according to an embodiment of the present application;

FIG. 2 is a flowchart of another video segmentation method based on a cascade residual convolutional neural network according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a structure of an improved BCNN model provided in an embodiment of the present application;

FIG. 4 is a schematic structural diagram of an improved SCNN model provided in an embodiment of the present application;

fig. 5 is a schematic structural diagram of a video segmentation device based on a cascade residual convolutional neural network according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

It should be noted that the video segmentation method and device based on the cascade residual convolution neural network disclosed by the application can be used in the technical field of artificial intelligence and can also be used in any field except the technical field of artificial intelligence, and the application field of the video segmentation method and device based on the cascade residual convolution neural network disclosed by the application is not limited.

In order to facilitate understanding of the technical scheme provided by the application, the following description will explain relevant contents of the technical scheme of the application. Video segmentation refers to the division of an image or video sequence into regions according to a certain criterion in order to separate meaningful entities, called video objects in digital video, from the video sequence. With the continued development of artificial intelligence technology, many technologies in the field of computer vision are continually applied to real life. The video segmentation technology is rapid in development, and has wide application in traffic monitoring, human body tracking, motion recognition, efficient video monitoring, anomaly detection and the like. The method solves the problems of larger separation error and poor performance in the existing video segmentation method when facing complex scenes such as sudden changes of video background environments, greatly improves the video segmentation effect and improves the accuracy of segmenting the region of interest; in addition, the Cascade Residual Convolutional Neural Network (CRCNN) based model is relatively simple, so that the requirements on hardware performance are smaller and the segmentation effect is excellent. The CRCNN model includes two modified Convolutional Neural Network (CNN) models, a modified Background Convolutional Neural Network (BCNN) model and a modified Split Convolutional Neural Network (SCNN) model, respectively. CNN is a feedforward neural network with a convolution calculation and a deep structure, deep learning is a common name of a mode analysis method, initial low-level characteristic representation is gradually converted into high-level characteristic representation through multi-layer processing according to the neural network, and learning tasks such as complex classification and the like can be completed through a simple model.

The implementation process of the video segmentation method based on the cascade residual convolutional neural network provided by the embodiment of the application is described below by taking a video segmentation device based on the cascade residual convolutional neural network as an execution subject. It can be understood that the implementation main body of the video segmentation method based on the cascade residual convolution neural network provided by the embodiment of the application includes, but is not limited to, a video segmentation device based on the cascade residual convolution neural network.

Fig. 1 is a flowchart of a video segmentation method based on a cascade residual convolutional neural network according to an embodiment of the present application, as shown in fig. 1, where the method includes:

step 101, obtaining a video frame to be segmented.

And 102, carrying out video segmentation processing on the video frames to be segmented through a pre-constructed CRCNN model to obtain target segmented video frames.

In the embodiment of the application, the CRCNN model comprises an improved BCNN model and an improved SCNN model.

In the embodiment of the application, the construction of the CRCNN model specifically comprises the following steps:

step a1, acquiring a training video frame data set, wherein the training video frame data set comprises training video frame data and corresponding binary image labels.

In the embodiment of the application, the binary image labels corresponding to the training video frame data are manually marked.

Specifically, if the training video frame data is a color video frame, performing gray level conversion on the color video frame to obtain a gray level video frame; carrying out frame extraction at fixed intervals on the converted gray video frames to obtain extracted video frames; extracting the background of the extracted video frame by an inter-frame difference method to obtain a deterministic background of the current scene; carrying out gray average value calculation on other video frames except the extracted video frames in the gray video frames; and subtracting the gray average value from other video frames to obtain the normalized gray video frame.

And a2, training the CRCNN according to the training video frame data and the corresponding binary image label to obtain a CRCNN model.

Specifically, taking the normalized gray video frame as an input data set, taking a corresponding binary image label as an output data set, training the CRCNN to obtain a CRCNN model until the segmentation accuracy of the CRCNN model reaches a preset accuracy threshold value, and obtaining the CRCNN model.

In the technical scheme provided by the embodiment of the application, the video frame to be segmented is obtained; the method comprises the steps that video segmentation processing is carried out on a video frame to be segmented through a pre-constructed cascade residual convolutional neural network model, a target segmented video frame is obtained, the cascade residual convolutional neural network model comprises an improved background convolutional neural network model and an improved segmentation convolutional neural network model, video segmentation is carried out through the improved background convolutional neural network model and the improved segmentation convolutional neural network model, under the condition of environmental fluctuation, video segmentation effect can be guaranteed, errors of a segmentation area are reduced, and segmentation precision is improved; in the segmentation process, the resource consumption is reduced, the video segmentation time is reduced, the segmentation efficiency is improved, and the requirement on hardware equipment is reduced.

Fig. 2 is a flowchart of another video segmentation method based on a cascade residual convolutional neural network according to an embodiment of the present application, as shown in fig. 2, the method includes:

step 201, obtaining a video frame to be segmented.

In the embodiment of the application, each step is executed by a video segmentation device based on a cascade residual convolution neural network.

In the embodiment of the application, the video frame to be segmented is acquired through the video acquisition device, and as an alternative scheme, the video acquisition device is a camera.

It should be noted that the video frame to be segmented may also be acquired from the software layer, for example: and obtaining the video frames to be segmented from the third party platform.

Step 202, judging whether the video frame to be segmented comprises a color video frame, if so, executing step 203; if not, go to step 204.

In the embodiment of the present application, if the video frame to be segmented includes a color video frame, it indicates that the video frame to be segmented needs to be subjected to gray level conversion, so as to facilitate the subsequent segmentation processing, and then step 203 is continuously executed; if the video frame to be segmented does not include the color video frame, it indicates that the video frame to be segmented is a gray scale video frame, and the gray scale conversion is not needed, and step 204 is performed.

And 203, carrying out gray processing on the color video frames to obtain the video frames to be segmented after gray processing.

Specifically, gray level transformation is carried out on the color video frames to obtain the video frames to be segmented after gray level processing, so that the display effect of the video frames to be segmented is clearer, and the subsequent segmentation processing is facilitated.

And 204, extracting the background of the video frame to be segmented to obtain a deterministic background image.

In the embodiment of the application, frame extraction is carried out on the video frames to be segmented at fixed intervals to obtain extracted video frames; and carrying out background extraction on the extracted video frames by an inter-frame difference method to obtain a deterministic background image of the current scene.

Step 205, carrying out gray scale normalization on the video frame to be segmented to obtain a normalized video frame to be segmented.

Specifically, gray average value calculation is carried out on other video frames except the extracted video frames in the video frames to be segmented; and subtracting the gray average value from other video frames to obtain normalized video frames to be segmented.

And 206, performing residual mapping on the video frame to be segmented according to the deterministic background image through the improved BCNN model to obtain a residual mapping result.

Specifically, inputting a deterministic background image and a video frame to be segmented into an improved BCNN model, calculating the mean square error of the minimized deterministic background image and the approximate background image, and outputting a forward propagation result of the BCNN to obtain a residual mapping result.

In the embodiment of the application, the improved BCNN model comprises a plurality of convolution layers, and at least two convolution layers in the plurality of convolution layers are connected by adopting residual errors. The improved BCNN model receives deterministic background images to-be-segmented video frames and is used for distinguishing which parts in the to-be-segmented video frames do not belong to the background. FIG. 3 is a schematic structural diagram of an improved BCNN model provided in an embodiment of the present application, as shown in FIG. 3, the improved BCNN model includes an input layer, a multi-layer convolution layer (CONV), and an output layer, the input layer is used for receiving deterministic background images to be segmented into video frames; different functions are adopted by different convolution layers, as shown in fig. 3, a linear rectification (ReLU) activation function is adopted by a single-layer convolution layer, a ReLU activation function and a Batch-Normalization (BN) activation function are adopted by a 20-layer convolution layer, and a linear function convolution layer is adopted by a single-layer convolution layer; the output layer is used for outputting residual mapping results, namely: residual map information of a video frame to be segmented.

It should be noted that, residual connection is adopted between the convolution layers of 20 layers in the improved BCNN model shown in fig. 3, and the residual connection is shown by a dotted line with an arrow in fig. 3, so as to further improve the performance of the improved BCNN model and improve the computing efficiency.

Step 207, video segmentation is performed on the video frame to be segmented and the residual mapping result through the improved SCNN model, so as to generate a target segmented video frame.

Specifically, inputting a video frame to be segmented and a residual mapping result into an improved SCNN model, splicing and combining the video frame to be segmented and the corresponding residual mapping result, calculating an average binary cross entropy measured between a minimum network output and a ground truth (ground true) binary detection mask, and outputting a forward propagation result of the SCNN to obtain a target segmented video frame, wherein the target segmented video frame comprises an interested region.

It should be noted that the region of interest may be a foreground image segmented from a video frame to be segmented, or may be a region of interest that may be pre-segmented by a user.

In the embodiment of the application, the improved SCNN model comprises a plurality of convolution layers, and at least two convolution layers in the plurality of convolution layers are connected by adopting residual errors. The improved SCNN model is used for detecting foreground objects existing in the video frames to be segmented, the foreground objects existing in the video frames to be segmented are segmented through the video frames to be segmented and corresponding residual mapping results, the segmented target segmented video frames are presented in the form of binary images, and white pixels correspond to the positions of the foreground objects. FIG. 4 is a schematic structural diagram of an improved SCNN model provided in an embodiment of the present application, as shown in FIG. 4, the improved SCNN model includes an input layer, a multi-layer convolution layer (CONV), and an output layer, the input layer is used for receiving a video frame to be segmented and a residual mapping result; different functions are adopted by different convolution layers, as shown in fig. 4, a ReLU activation function is adopted by a single-layer convolution layer, a ReLU activation function and a BN activation function are adopted by a 20-layer convolution layer, and a Sigmoid activation function is adopted by a single-layer convolution layer; the output layer is used for outputting target segmentation video frames, namely: binary image information for video segmentation.

It should be noted that, in the improved SCNN model shown in fig. 4, residual connection is adopted between 20 layers of convolution layers, and is shown by a dotted line with an arrow in fig. 4, so as to further improve the performance of the improved SCNN model and improve the computing efficiency.

In the embodiment of the application, the CRCNN model can also effectively distinguish the interested area and the background area in the video frame aiming at the scene with abrupt environment, can effectively divide the video frame in the scenes such as common video monitoring and satellite remote sensing images, can greatly improve the video division effect, and has very excellent video division performance; the model parameters of the CRCNN model are reduced by about 7 times compared with the model parameters of the network model in the prior art, so that the efficiency of operation is higher, and good performance can be ensured on hardware equipment with limited performance.

And step 208, marking the region of interest by a preset marking mode, wherein the marking mode comprises one or any combination of highlighting marks, square marks and underline marks.

In an embodiment of the present application, the object-segmented video frame includes a region of interest.

It should be noted that the marking mode is not limited to one of highlighting, box marking and underline marking or any combination thereof, and may also have other marking modes, which are not limited in this embodiment of the present application.

In the embodiment of the application, the region of interest is marked and displayed so as to feed back the segmented video frames to the user.

It is worth to be noted that, in the technical scheme of the application, the acquisition, storage, use, processing and the like of the data all conform to the relevant regulations of laws and regulations. The user information in the embodiment of the application is obtained through legal compliance approaches, and the user information is obtained, stored, used, processed and the like through the approval of the client.

In the technical scheme of the video segmentation method based on the cascade residual convolutional neural network, which is provided by the embodiment of the application, a video frame to be segmented is obtained; the method comprises the steps that video segmentation processing is carried out on a video frame to be segmented through a pre-constructed cascade residual convolutional neural network model, a target segmented video frame is obtained, the cascade residual convolutional neural network model comprises an improved background convolutional neural network model and an improved segmentation convolutional neural network model, video segmentation is carried out through the improved background convolutional neural network model and the improved segmentation convolutional neural network model, under the condition of environmental fluctuation, video segmentation effect can be guaranteed, errors of a segmentation area are reduced, and segmentation precision is improved; in the segmentation process, the resource consumption is reduced, the video segmentation time is reduced, the segmentation efficiency is improved, and the requirement on hardware equipment is reduced.

Fig. 5 is a schematic structural diagram of a video segmentation apparatus based on a cascade residual convolutional neural network according to an embodiment of the present application, where the apparatus is configured to perform the video segmentation method based on a cascade residual convolutional neural network, as shown in fig. 5, and the apparatus includes: a first acquisition unit 11 and a video segmentation unit 12.

The first acquisition unit 11 is configured to acquire a video frame to be segmented.

The video segmentation unit 12 is configured to perform video segmentation processing on a video frame to be segmented through a pre-constructed cascade residual convolutional neural network model, so as to obtain a target segmented video frame, where the cascade residual convolutional neural network model includes an improved background convolutional neural network model and an improved segmentation convolutional neural network model.

In the embodiment of the application, the device further comprises: a gradation processing unit 13.

The gray processing unit 13 is configured to perform gray processing on the color video frame if the video frame to be segmented includes the color video frame, so as to obtain the video frame to be segmented after gray processing.

In the embodiment of the present application, the video segmentation unit 12 is specifically configured to perform background extraction on a video frame to be segmented to obtain a deterministic background image; performing residual mapping on the video frames to be segmented according to the deterministic background image by using an improved background convolutional neural network model to obtain a residual mapping result; and performing video segmentation on the video frame to be segmented and the residual mapping result through an improved segmentation convolutional neural network model to generate a target segmentation video frame.

In the embodiment of the application, the device further comprises: normalization unit 14.

The normalization unit 14 is configured to perform gray scale normalization on the video frame to be segmented, so as to obtain a normalized video frame to be segmented.

In the embodiment of the application, the device further comprises: a second acquisition unit 15 and a model training unit 16.

The second obtaining unit 15 is configured to obtain a training video frame data set, where the training video frame data set includes training video frame data and a corresponding binary image tag.

The model training unit 16 is configured to train the cascade residual convolutional neural network based on training video frame data and corresponding binary image labels, so as to obtain a cascade residual convolutional neural network based model.

In the embodiment of the application, the target segmentation video frame comprises an interested region; the apparatus further comprises: a marking unit 17.

The marking unit 17 is configured to mark the region of interest by a preset marking mode, where the marking mode includes one or any combination of highlighting, square marking and underlining.

In the scheme of the embodiment of the application, the video frame to be segmented is obtained; the method comprises the steps that video segmentation processing is carried out on a video frame to be segmented through a pre-constructed cascade residual convolutional neural network model, a target segmented video frame is obtained, the cascade residual convolutional neural network model comprises an improved background convolutional neural network model and an improved segmentation convolutional neural network model, video segmentation is carried out through the improved background convolutional neural network model and the improved segmentation convolutional neural network model, under the condition of environmental fluctuation, video segmentation effect can be guaranteed, errors of a segmentation area are reduced, and segmentation precision is improved; in the segmentation process, the resource consumption is reduced, the video segmentation time is reduced, the segmentation efficiency is improved, and the requirement on hardware equipment is reduced.

The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. A typical implementation device is a computer device, which may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

The embodiment of the application provides a computer device, which comprises a memory and a processor, wherein the memory is used for storing information comprising program instructions, the processor is used for controlling the execution of the program instructions, and the program instructions are loaded and executed by the processor to realize the steps of the embodiment of the video segmentation method based on the cascade residual convolution neural network.

Referring now to FIG. 6, there is illustrated a schematic diagram of a computer device 600 suitable for use in implementing embodiments of the present application.

As shown in fig. 6, the computer apparatus 600 includes a Central Processing Unit (CPU) 601, which can perform various appropriate works and processes according to a program stored in a Read Only Memory (ROM) 602 or a program loaded from a storage section 608 into a Random Access Memory (RAM) 603. In the RAM603, various programs and data required for the operation of the computer device 600 are also stored. The CPU601, ROM602, and RAM603 are connected to each other through a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

The following components are connected to the I/O interface 605: an input portion 606 including a keyboard, mouse, etc.; an output portion 607 including a Cathode Ray Tube (CRT), a liquid crystal feedback device (LCD), and the like, and a speaker, and the like; a storage section 608 including a hard disk and the like; and a communication section 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the internet. The drive 610 is also connected to the I/O interface 605 as needed. Removable media 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on drive 610 as needed, so that a computer program read therefrom is mounted as needed as storage section 608.

In particular, according to embodiments of the present application, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present application include a computer program product comprising a computer program tangibly embodied on a machine-readable medium, the computer program comprising program code for performing the method shown in the flowchart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication portion 609, and/or installed from the removable medium 611.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functions of each element may be implemented in the same piece or pieces of software and/or hardware when implementing the present application.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.

The technical scheme of the application obtains, stores, uses, processes and the like the data, which all meet the relevant regulations of national laws and regulations.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.

The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and variations of the present application will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the application are to be included in the scope of the claims of the present application.

Claims

1. A video segmentation method based on a cascade residual convolutional neural network, the method comprising:

obtaining a video frame to be segmented;

and carrying out video segmentation processing on the video frame to be segmented through a pre-constructed cascade residual error-based convolutional neural network model to obtain a target segmented video frame, wherein the cascade residual error-based convolutional neural network model comprises an improved background convolutional neural network model and an improved segmentation convolutional neural network model.

2. The video segmentation method based on the cascade residual convolutional neural network according to claim 1, wherein before the video segmentation processing is performed on the video frame to be segmented through the pre-constructed cascade residual convolutional neural network model, the method further comprises:

3. The video segmentation method based on the cascade residual convolutional neural network according to claim 1, wherein the video segmentation processing is performed on the video frame to be segmented through a pre-constructed cascade residual convolutional neural network model to obtain a target segmented video frame, and the method comprises the following steps:

performing residual mapping on the video frames to be segmented according to the deterministic background image by using the improved background convolutional neural network model to obtain a residual mapping result;

and performing video segmentation on the video frame to be segmented and the residual mapping result through the improved segmentation convolutional neural network model to generate a target segmentation video frame.

4. The video segmentation method based on the cascade residual convolutional neural network according to claim 3, wherein before the performing residual mapping on the video frame to be segmented according to the deterministic background image by the improved background convolutional neural network model to obtain a residual mapping result, further comprising:

and carrying out gray scale normalization on the video frame to be segmented to obtain a normalized video frame to be segmented.

5. The video segmentation method based on the cascade residual convolutional neural network according to claim 1, wherein the modified background convolutional neural network model comprises a plurality of convolutional layers, and at least two convolutional layers in the plurality of convolutional layers are connected by residual errors.

6. The video segmentation method based on the cascade residual convolutional neural network according to claim 1, wherein the modified segmented convolutional neural network model comprises a plurality of convolutional layers, and at least two convolutional layers in the plurality of convolutional layers are connected by residual errors.

7. The video segmentation method based on the cascade residual convolutional neural network according to claim 1, further comprising:

acquiring a training video frame data set, wherein the training video frame data set comprises training video frame data and corresponding binary image tags;

8. The video segmentation method based on the cascade residual convolutional neural network according to claim 1, wherein the target segmented video frame comprises a region of interest;

after the video segmentation processing is carried out on the video frame to be segmented through the pre-constructed cascade residual convolution neural network model to obtain the target segmented video frame, the method further comprises the following steps:

marking the region of interest by a preset marking mode, wherein the marking mode comprises one or any combination of highlighting marks, square marks and underline marks.

9. A video segmentation apparatus based on a cascade residual convolutional neural network, the apparatus comprising:

the video segmentation unit is used for carrying out video segmentation processing on the video frames to be segmented through a pre-constructed cascade residual error-based convolutional neural network model to obtain target segmented video frames, wherein the cascade residual error-based convolutional neural network model comprises an improved background convolutional neural network model and an improved segmentation convolutional neural network model.

10. A computer readable medium having stored thereon a computer program, which when executed by a processor implements the video segmentation method based on a cascade residual convolutional neural network as claimed in any one of claims 1 to 8.

11. A computer device comprising a memory for storing information including program instructions and a processor for controlling execution of the program instructions, wherein the program instructions when loaded and executed by the processor implement the cascade residual convolutional neural network-based video segmentation method of any one of claims 1 to 8.

12. A computer program product comprising computer programs/instructions which, when executed by a processor, implement the video segmentation method based on a cascade residual convolutional neural network according to any one of claims 1 to 8.