CN111145135B

CN111145135B - Image descrambling processing method, device, equipment and storage medium

Info

Publication number: CN111145135B
Application number: CN201911401988.1A
Authority: CN
Inventors: 余自强; 罗雪
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-12-30
Filing date: 2019-12-30
Publication date: 2021-08-10
Anticipated expiration: 2039-12-30
Also published as: CN111145135A

Abstract

The invention provides an image de-scrambling method, device, equipment and storage medium; the method comprises the following steps: determining a target area where interference content is located in any image of an image sequence, and determining a reference area where material content is located in any image, wherein the material content is used for replacing the interference content; matching each image in the image sequence based on the target area to obtain a mask corresponding to the target area in each image; performing morphological processing on the mask corresponding to the target area in each image to obtain an adjusted mask of each image; and performing fusion processing on the reference region and the adjusted masks of the images to obtain fused images which correspond to the images one by one and are removed of the interference content. By the method and the device, the interference content in the image sequence can be automatically removed, and the interference removal efficiency is improved.

Description

Image descrambling processing method, device, equipment and storage medium

Technical Field

The present invention relates to artificial intelligence technologies, and in particular, to an image descrambling method and apparatus, an electronic device, and a storage medium.

Background

Artificial Intelligence (AI) is a comprehensive technique in computer science, and by studying the design principles and implementation methods of various intelligent machines, the machines have the functions of perception, reasoning and decision making. The artificial intelligence technology is a comprehensive subject and relates to a wide range of fields, for example, natural language processing technology and machine learning/deep learning, etc., and along with the development of the technology, the artificial intelligence technology can be applied in more fields and can play more and more important values.

In the visual processing technology based on artificial intelligence, image descrambling is an important research direction, and interference of useless information when a user watches images can be eliminated.

However, in the related art, when there is an interfering content (e.g., a clutter) in an image sequence, each image in the image sequence is currently mainly processed by a human, so as to remove the interfering content in the image sequence. However, it is time-consuming and labor-consuming to rely on a great deal of manpower to remove the interference content, and the interference removal efficiency is low.

Disclosure of Invention

The embodiment of the invention provides an image interference elimination processing method, an image interference elimination processing device and a storage medium, which can automatically eliminate interference content in an image sequence and improve interference elimination efficiency.

The technical scheme of the embodiment of the invention is realized as follows:

the embodiment of the invention provides an image interference elimination processing method, which comprises the following steps:

determining a target area where interference content is located in any image of an image sequence, and determining a reference area where material content is located in any image, wherein the material content is used for replacing the interference content;

matching each image in the image sequence based on the target area to obtain a mask corresponding to the target area in each image;

performing morphological processing on the mask corresponding to the target area in each image to obtain an adjusted mask of each image;

and performing fusion processing on the reference region and the adjusted masks of the images to obtain fused images which correspond to the images one by one and are removed of the interference content.

presenting the image sequence in the client, an

In any image of the image sequence, presenting a surrounding frame of a target area where interference content is located, and presenting a surrounding frame of a reference area where material content is located aiming at the material content used for replacing the interference content;

in response to the interference elimination operation aiming at the target area, carrying out identification processing on each image in the image sequence based on the target area and the reference area in any one image to obtain the target area and the reference area in each image;

performing fusion processing on the target area and the reference area in each image to obtain a fused image which corresponds to each image one by one and is free of the interference content;

and presenting the one-to-one corresponding fusion image at the client.

An embodiment of the present invention provides an image descrambling processing device, including:

the device comprises a first determining module, a second determining module and a third determining module, wherein the first determining module is used for determining a target area where interference content is located in any image of an image sequence and determining a reference area where material content is located in any image, and the material content is used for replacing the interference content;

the first processing module is used for matching each image in the image sequence based on the target area to obtain a mask corresponding to the target area in each image;

the second processing module is used for performing morphological processing on the mask corresponding to the target area in each image to obtain the adjusted mask of each image;

and the fusion module is used for carrying out fusion processing on the reference region and the adjusted masks of the images to obtain fusion images which correspond to the images one by one and remove the interference content.

In the above technical solution, the first determining module is further configured to perform image segmentation processing on any image of the image sequence to obtain an image region to be matched;

matching the image area to be matched based on the interference content data set, and determining the image area to be matched as a target area where interference content is located in any one image when matching is successful;

determining edge features of the target area;

in any one of the images, a region having an edge feature that meets a similarity condition with an edge feature of the target region is determined as a reference region, and the content in the determined reference region is determined as material content.

In the above technical solution, the first determining module is further configured to determine, in response to a region selection operation for any one of the images, that a region set by the region selection operation is a target region where interference content is located;

and responding to the material selecting operation aiming at any image, and determining the area which is set by the material selecting operation and comprises the material content as a reference area.

In the above technical solution, the first processing module is further configured to execute the following processing on any image in the image sequence:

performing convolution processing on the target area and the image respectively to obtain a first feature map corresponding to the target area and a second feature map corresponding to the image;

performing convolution processing on the first characteristic diagram and the second characteristic diagram to obtain a third characteristic diagram;

and performing mask extraction processing on the third feature map to obtain a mask corresponding to the target area in the image.

In the above technical solution, the first processing module is further configured to perform convolution processing on the third feature map to obtain a plurality of masks and accuracies corresponding to the masks;

determining a mask corresponding to the maximum accuracy as a mask in the image corresponding to the target region.

In the above technical solution, the apparatus further includes:

the second determining module is used for performing convolution processing on the matched feature map to obtain position information and size information corresponding to the mask;

determining the position information of the mask corresponding to the maximum accuracy as the position information of the mask corresponding to the target area in the image, and

determining the size information of the mask corresponding to the maximum accuracy as the size information of the mask corresponding to the target area in the image;

the device further comprises:

a third determining module, configured to determine location information of a reference region in the image according to a distance between the target region and the reference region and location information of a mask corresponding to the target region in the image;

multiplying the ratio of the size of the reference area to the size of the target area by the size information of a mask corresponding to the target area in the image, and determining the obtained size information as the size information of the reference area in the image;

determining a reference region in the image based on the position information and the size information of the reference region.

In the above technical solution, the second processing module is further configured to perform expansion processing on the mask corresponding to the target area in each image to obtain an adjusted mask with a cavity removed; alternatively, the first and second electrodes may be,

and performing expansion processing on the mask corresponding to the target area in each image to obtain a mask with a removed cavity, and performing corrosion processing on the mask with the removed cavity to obtain an adjusted mask.

In the above technical solution, the fusion module is further configured to perform fusion processing on the edge information of the reference area and the part to be fused, and the adjusted mask in the image, according with the following conditions: the edge information of the fusion part in the fusion image is consistent with the edge information of the reference area;

and the difference value between the gradient of the fusion part in the fusion image and the gradient of the part to be fused of the reference region is smaller than a gradient difference threshold value.

In the above technical solution, the apparatus further includes:

and the restoration module is used for carrying out pixel point restoration processing on adjacent fusion images in the fusion images corresponding to the images based on the optical flow information corresponding to the target area in the fusion images corresponding to the images to obtain continuous fusion images.

In the above technical solution, the apparatus further includes:

a fourth determining module, configured to determine first optical flow information corresponding to the target area between adjacent images in each of the images;

determining the first optical flow information as second optical flow information between the adjacent fused images corresponding to the target area based on a fused image corresponding to the image;

the restoration module is further configured to blend the second optical flow information into the pixel point information of the adjacent blended image;

and when the brightness difference value of the pixel point of the adjacent fusion image containing the optical flow information is larger than the brightness difference value threshold, performing brightness restoration processing on the pixel point of the adjacent fusion image containing the optical flow information to obtain a continuous fusion image.

In the foregoing technical solution, the restoration module is further configured to, when the adjacent fused image including the optical flow information is a first image and a second image, execute one of the following processes on the second image:

adding the brightness of the pixel point of the first image and the brightness difference value threshold to obtain the brightness of the pixel point, and determining the brightness of the pixel point of the second image after brightness restoration processing;

and determining the average value of the brightness of the pixel points of the first image and the second image before the brightness restoration processing as the brightness of the pixel points of the second image after the brightness restoration processing.

The embodiment of the invention provides an image interference elimination processing device, which comprises:

a first rendering module for rendering the sequence of images in the client and

a third processing module, configured to perform recognition processing on each image in the image sequence based on a target region and a reference region in any one of the images in response to a descrambling operation for the target region, so as to obtain the target region and the reference region in each image;

and the second presentation module is used for presenting the one-to-one corresponding fusion image at the client.

In the above technical solution, the first presenting module is further configured to present, in response to a region selection operation for any one of the images of the image sequence, a bounding box of a target region where the selected interference content is located in the any one of the images;

and in response to the selection operation of the material in any image, presenting a surrounding frame of a reference area where the content of the selected material is located in any image.

In the above technical solution, the first presentation module is further configured to perform image segmentation processing on any image of the image sequence in response to a region identification request for the image sequence, so as to obtain an image region to be matched;

presenting an enclosure of a target area in which the interfering content is located;

determining an area with edge features meeting similar conditions with the edge features of the target area as a reference area, and determining the content in the determined reference area as material content;

and presenting a surrounding frame of the reference area where the material content is positioned.

An embodiment of the present invention provides an image descrambling processing device, where the device includes:

a memory for storing executable instructions;

and the processor is used for realizing the image descrambling processing method provided by the embodiment of the invention when the executable instructions stored in the memory are executed.

The embodiment of the invention provides a computer-readable storage medium, which stores executable instructions and is used for causing a processor to execute the method for processing the image interference elimination provided by the embodiment of the invention.

The embodiment of the invention has the following beneficial effects:

the method comprises the steps of determining a target area and a reference area in any image in an image sequence, matching each image in the image sequence according to the target area to obtain a mask corresponding to the target area in each image, namely automatically matching the target area in each image, and performing fusion processing on the reference area and the mask of each image to obtain a fused image which corresponds to each image one by one and is free of interference content, so that the interference content in the image sequence is automatically removed, the interference removal efficiency is improved, and a large amount of human resources are saved.

Drawings

Fig. 1 is a schematic view of an application scenario of an image descrambling processing system 10 according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of an image descrambling processing device 500 according to an embodiment of the present invention;

3A-3B are schematic flow charts of image descrambling processing methods provided by embodiments of the present invention;

fig. 4 is a schematic structural diagram of an image descrambling processing device 600 according to an embodiment of the present invention;

FIG. 5 is a flowchart illustrating an image descrambling processing method according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of a selection interface of a target area and a reference area provided in an embodiment of the present invention;

FIG. 7 is a schematic diagram of an interface of a video with sundries according to an embodiment of the present invention;

FIG. 8 is a schematic interface diagram for removing sundries in a video according to an embodiment of the present invention;

FIG. 9 is a flowchart illustrating an image descrambling processing method according to an embodiment of the present invention;

FIG. 10 is a schematic diagram of a SiamMask network model according to an embodiment of the present invention;

FIGS. 11A-11B are schematic diagrams illustrating comparison of the output effects of the SiamMask network model provided by the embodiment of the present invention;

FIGS. 12A-12B are schematic diagrams illustrating comparison of the adjustment effect of the reference area provided by the embodiment of the present invention;

FIG. 13 is a schematic diagram of user selection provided by an embodiment of the invention;

FIGS. 14A-14B are schematic diagrams illustrating the comparison of the morphological treatment effects provided by the embodiments of the present invention;

FIGS. 15A-15B are schematic diagrams illustrating the fusion effect comparison provided by embodiments of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail with reference to the accompanying drawings, the described embodiments should not be construed as limiting the present invention, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

In the following description, references to the terms "first \ second \ third \ fourth" are only to distinguish similar objects and do not denote a particular order or importance to the objects, and it is to be understood that "first \ second \ third \ fourth" may be interchanged with a particular order or sequence as appropriate to enable the embodiments of the invention described herein to be practiced in an order other than that shown or described herein.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein is for the purpose of describing embodiments of the invention only and is not intended to be limiting of the invention.

Before further detailed description of the embodiments of the present invention, terms and expressions mentioned in the embodiments of the present invention are explained, and the terms and expressions mentioned in the embodiments of the present invention are applied to the following explanations.

1) Image sequence: and sequentially and continuously acquiring a series of images of the target at different times and different directions. For example, images, videos, and the like continuously shot by a camera belong to the image sequence.

2) Video: consisting of a series of image frames, i.e. of a sequence of images. The fluency of the video can be represented by Frames Per Second (FPS), and the more Frames Per Second, the more fluency the displayed motion will be. FPS is a definition in the field of images, and colloquially refers to the number of pictures in an animation or video. Each image frame is a still image, and when played in sequence, a moving image is created. For example, 30FPS indicates that 30 "still images" will be played per second.

3) Mask (mask): the object is in an image area surrounded by an object contour in the image. Only the content of the mask portion may be displayed in the picture.

4) Target area: the area where the interfering content is located. The disturbing content is content affecting perception of the image sequence, for example, a trash can exists in a piece of image sequence related to beautiful scenery, the trash can affects the aesthetic feeling of the piece of image sequence, and the trash can is the disturbing content (sundries); in order to shoot a certain program, a video is shot in a certain television program, and then a certain person in the video needs to be removed for some reasons, so that the person is the interference content.

5) Reference area: the method is used for replacing the area where the material content of the interference content is located, and the material is used for replacing the interference content, so that the effect of eliminating the influence of the interference content on the image sequence is achieved. For example, a segment of image sequence related to a beautiful scene has a trash can, the trash can is interference content, a plant can be used to replace the trash can in the image sequence, and the plant is a material; the television program needs to remove a person in a section of captured video, the person is interference content, the person in the video can be replaced by the background where the person is located, and then the background is a material.

6) Morphology: and extracting image components which are meaningful for expressing and describing the shape of the region from the image, so that the most essential shape characteristics of the target object, such as a boundary, a connected region and the like, can be grasped by subsequent recognition work. Morphology is the basic theory of mathematical morphology image processing, and its basic operations include: binary corrosion and expansion, binary open-close operation, skeleton extraction, limit corrosion, hit-miss transformation, morphological gradient, To p-hat transformation, particle analysis, watershed transformation, grey value corrosion and expansion, grey value open-close operation, grey value morphological gradient and the like.

7) Poisson fusion (Seamless Cloning): according to the gradient information of the source image (for example, the reference region) and the boundary information of the target image (for example, the target region), the image pixels in the synthesis region are reconstructed by an interpolation method, and the color and the gradient in the source image are changed in the fusion process, so that the effect of seamless fusion of the images is achieved.

8) Optical flow (optical flow): the instantaneous speed of pixel motion of a spatially moving object on the viewing imaging plane (the speed of motion of the object in a time-varying image). When the object is moving, the brightness pattern of its corresponding point on the image is also moving, and the change of the image can be expressed by the optical flow. The optical flow method is a method for calculating motion information of an object between adjacent frames by finding a correspondence between a previous frame and a current frame using a change of a pixel in an image sequence in a time domain and a correlation between adjacent frames.

In order to at least solve the foregoing technical problems of the related art, embodiments of the present invention provide an image descrambling method and apparatus, an electronic device, and a storage medium, which automatically remove the interference content in an image sequence and improve descrambling efficiency. An exemplary application of the image descrambling processing device provided by the embodiment of the present invention is described below, where the image descrambling processing device provided by the embodiment of the present invention may be a server, for example, a server deployed in a cloud, and according to an image sequence provided by other devices or a user, a series of processing is performed on the image sequence to obtain a fused image which is one-to-one corresponding to each image in the image sequence and from which interference content is removed, for example, the server obtains the image sequence according to the other devices, and performs matching, morphology, fusion and other processing on the image sequence to obtain a fused image which is one-to-one corresponding to each image in the image sequence and from which interference content is removed; the image fusion method includes the steps that a fusion image which corresponds to each image in an image sequence one by one and is free of interference content is obtained according to the image sequence input by a user on the handheld terminal, and the fusion image is displayed on a display interface of the handheld terminal.

Referring to fig. 1 by way of example, fig. 1 is a schematic view of an application scenario of an image descrambling processing system 10 according to an embodiment of the present invention, a terminal 200 is connected to a server 100 through a network 300, and the network 300 may be a wide area network or a local area network, or a combination of both.

The terminal 200 may be used to acquire the image sequence, for example, when the user inputs the image sequence through the input interface, the terminal automatically acquires the image sequence and sends the image sequence to the server after the input is completed.

In some embodiments, the terminal 200 locally performs the image descrambling processing method provided by the embodiments of the present invention to obtain fused images that correspond to the images in the image sequence one-to-one and have interference contents removed, for example, an Application (APP) is installed on the terminal 200, such as a descrambling APP, a user inputs the image sequence in the descrambling APP, the terminal 200 determines a target region where the interference contents are located in any image of the image sequence, and determines a reference region where the material contents are located in any image; matching each image in the image sequence based on the target area to obtain a mask corresponding to the target area in each image; performing morphological processing on the mask corresponding to the target area in each image to obtain the adjusted mask of each image; and performing fusion processing on the reference area and the adjusted masks of the images to obtain fused images which correspond to the images one by one and have interference contents removed, and sequentially displaying the fused images on the display interface 210 of the terminal 200, so that the interference contents in the image sequence are automatically removed.

In some embodiments, the terminal 200 may also send an image sequence input by the user on the terminal 200 to the server 100 through the network 300, and invoke an image descrambling processing function provided by the server 100, the server 100 obtains a fused image that corresponds to each image in the image sequence one to one and removes the interference content through the image descrambling processing method provided by the embodiments of the present invention, for example, a descrambling APP is installed on the terminal 200, the user inputs the image sequence in the descrambling APP, the terminal sends the image sequence to the server 100 through the network 300, after receiving the image sequence, the server 100 performs a series of processing on the image sequence, obtains a fused image that corresponds to each image in the image sequence one to one and removes the interference content, returns the fused image to the descrambling APP, and sequentially displays each fused image on the display interface 210 of the terminal 200, alternatively, the server 100 directly gives the fused image.

Continuing with the structure of the image descrambling processing device provided by the embodiment of the present invention, the image descrambling processing device may be various terminals, such as a mobile phone, a computer, etc., or may be the server 100 shown in fig. 1.

Referring to fig. 2, fig. 2 is a schematic structural diagram of an image descrambling processing device 500 according to an embodiment of the present invention, and the image descrambling processing device 500 shown in fig. 2 includes: at least one processor 510, memory 550, at least one network interface 520, and a user interface 530. The various components in the image descrambling processing device 500 are coupled together by a bus system 540. It is understood that the bus system 540 is used to enable communications among the components. The bus system 540 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are labeled as bus system 540 in fig. 2.

The Processor 510 may be an integrated circuit chip having Signal processing capabilities, such as a general purpose Processor, a Digital Signal Processor (DSP), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like, wherein the general purpose Processor may be a microprocessor or any conventional Processor, or the like.

The user interface 530 includes one or more output devices 531 enabling presentation of media content, including one or more speakers and/or one or more visual display screens. The user interface 530 also includes one or more input devices 532, including user interface components to facilitate user input, such as a keyboard, mouse, microphone, touch screen display, camera, other input buttons and controls.

The memory 550 may comprise volatile memory or nonvolatile memory, and may also comprise both volatile and nonvolatile memory. The non-volatile Memory may be a Read Only Memory (ROM), and the volatile Memory may be a Random Access Memory (RAM). The memory 550 described in connection with embodiments of the invention is intended to comprise any suitable type of memory. Memory 550 optionally includes one or more storage devices physically located remote from processor 510.

In some embodiments, memory 550 can store data to support various operations, examples of which include programs, modules, and data structures, or subsets or supersets thereof, as exemplified below.

An operating system 551 including system programs for processing various basic system services and performing hardware-related tasks, such as a framework layer, a core library layer, a driver layer, etc., for implementing various basic services and processing hardware-based tasks;

a network communication module 552 for communicating to other computing devices via one or more (wired or wireless) network interfaces 520, exemplary network interfaces 520 including: bluetooth, wireless compatibility authentication (WiFi), and Universal Serial Bus (USB), etc.;

a display module 553 for enabling presentation of information (e.g., a user interface for operating peripherals and displaying content and information) via one or more output devices 531 (e.g., a display screen, speakers, etc.) associated with the user interface 530;

an input processing module 554 to detect one or more user inputs or interactions from one of the one or more input devices 532 and to translate the detected inputs or interactions.

In some embodiments, the image descrambling processing Device provided by the embodiments of the present invention may be implemented by a combination of hardware and software, and by way of example, the image descrambling processing Device provided by the embodiments of the present invention may be a processor in the form of a hardware decoding processor, which is programmed to execute the image descrambling processing method provided by the embodiments of the present invention, for example, the processor in the form of the hardware decoding processor may be implemented by one or more Application Specific Integrated Circuits (ASICs), DSPs, Programmable Logic Devices (PLDs), Complex Programmable Logic Devices (CPLDs), Field Programmable Gate Arrays (FPGAs), or other electronic components.

In other embodiments, the image descrambling processing device provided by the embodiment of the present invention may be implemented in a software manner, and fig. 2 shows an image descrambling processing device 555 stored in a memory 550, which may be software in the form of programs, plug-ins, and the like, and includes a series of modules, including a first determining module 5551, a first processing module 5552, a second processing module 5553, a fusing module 5554, a second determining module 5555, a third determining module 5556, a repairing module 5557, and a fourth determining module 5558; the first determining module 5551, the first processing module 5552, the second processing module 5553, the fusion module 5554, the second determining module 5555, the third determining module 5556, the repair module 5557, and the fourth determining module 5558 are configured to implement the image descrambling processing method provided by the embodiment of the present invention.

It is understood that the image descrambling processing method provided by the embodiment of the invention can be executed by an image descrambling processing device, and the image descrambling processing device includes, but is not limited to, a server or a terminal.

The following describes an image descrambling processing method provided by the embodiment of the present invention, with reference to an exemplary application and implementation of the server provided by the embodiment of the present invention. Referring to fig. 3A, fig. 3A is a schematic flowchart of an image descrambling processing method according to an embodiment of the present invention, which is described with reference to the steps shown in fig. 3A.

In step 101, a target area where the interference content is located in any one of the images of the image sequence is determined, and a reference area where the material content is located in any one of the images is determined, the material content being used to replace the interference content.

The user can input an image sequence on an input interface of the terminal, wherein the image sequence can be a video or an image set continuously shot by a camera. The terminal can send the image sequence to the server, and after receiving the image sequence, the server can determine a target area in any image of the image sequence and a reference area where material content is located, where the area where the interference content is located is the target area, and the material content is used to replace the interference content, for example, the interference content is some garbage, some living goods, some person, and the like, presented in the image.

In some embodiments, determining a target region in any one of the images of the sequence of images where the interfering content is located comprises: carrying out image segmentation processing on any image of the image sequence to obtain an image area to be matched; and performing matching processing on the image area to be matched based on the interference content data set, and determining the image area to be matched as a target area where the interference content is located in any image when matching is successful.

Determining a reference area where the material content is located in any one of the images, including: determining the edge characteristics of the target area; in any one of the images, a region having an edge feature that meets a similarity condition with an edge feature of the target region is determined as a reference region, and the content in the determined reference region is determined as material content.

When the server receives the image sequence, the server can automatically identify a target area and a reference area in any image in the image sequence, wherein any image can be the first image in the image sequence, the middle image in the image sequence or the last image in the image sequence. Firstly, any image in an image sequence is determined, and image segmentation processing is carried out on any image according to edge information in any image, so that a plurality of image areas to be matched are obtained. Then, acquiring an interference content data set which comprises various interference contents for subsequent matching processing, matching an image area to be matched based on the interference content data set, when the image area to be matched is similar to one interference content in the interference content data set, namely the similarity is greater than a similarity threshold value, matching is successful, and the image area to be matched is determined as a target area where the interference content in any image is located; and when the image area to be matched is not similar to all the interference contents in the interference content data set, namely the similarity is less than or equal to the similarity threshold, the matching is unsuccessful, and the fact that sundries do not exist in the image sequence is determined.

After the server determines the target area, further determining an edge feature of the target area, so as to automatically determine the reference area according to the edge feature subsequently, wherein the determined edge feature of the reference area and the edge feature of the target area conform to a similar condition, the similar condition may be that a similarity between the edge feature of the reference area and the edge feature of the target area is greater than a set threshold, and the content in the determined reference area is determined as the material content.

In some embodiments, determining a target region in any one of the images of the sequence of images where the interfering content is located comprises: determining any one image in the image sequence; responding to the region selection operation aiming at any image, and determining a region set by the region selection operation as a target region where the interference content is located;

determining a reference area where the material content is located in any one of the images, including: in response to a material selection operation for any one of the images, a region including material content set by the material selection operation is determined as a reference region.

The method comprises the steps that a user can input an image sequence on a terminal, simultaneously, a target area and a reference area in any image in the image sequence are manually selected, the image sequence, the target area and the reference area are sent to a server, the server responds to the area selection operation of the user for any image, the area set by the area selection operation is determined to be the target area where interference content is located, the area set by the material selection operation and including the material content is determined to be the reference area in response to the material selection operation for any image, and therefore the server obtains the target area and the reference area manually selected by the user.

In step 102, each image in the image sequence is subjected to matching processing based on the target area, so as to obtain a mask corresponding to the target area in each image.

After the server obtains the target region and the reference region of any image, matching processing can be performed on each image in the image sequence according to the target region, so that a mask corresponding to the target region in each image is obtained, and then fusion processing can be performed according to the mask.

Referring to fig. 3B, fig. 3B is an optional flowchart provided in an embodiment of the present invention, and in some embodiments, fig. 3B illustrates that step 102 in fig. 3A may be implemented by step 1021 to step 1023 illustrated in fig. 3B.

For any image in the image sequence, the following processing is performed:

at step 1021, performing convolution processing on the target area and the image through the first convolution layer in the neural network model to obtain a first feature map corresponding to the target area and a second feature map corresponding to the image;

in step 1022, performing convolution processing on the first feature map and the second feature map to obtain a third feature map;

in step 1023, a mask extraction process is performed on the third feature map to obtain a mask corresponding to the target area in the image.

For any image in the image sequence, the convolution processing can be respectively carried out on the target area and the image through a first convolution layer in the neural network model to obtain a first feature map corresponding to the target area and a second feature map corresponding to the image, the convolution processing can be carried out on the second feature map corresponding to the image according to the first feature map corresponding to the target area, or the convolution processing can be carried out on the first feature map corresponding to the target area according to the second feature map corresponding to the image to obtain a third feature map (matched feature map), the third feature map comprises a middle correlation coefficient between the target area and the image, and the mask extraction processing can be carried out on the third feature map to obtain a mask corresponding to the target area in the image, so that the mask can be subjected to fusion processing in the subsequent process. In the process of training the neural network model, the neural network model is trained through a training sample set with labels, so that the trained neural network model can identify and process an image sequence according to an input target area and the image sequence to obtain masks in images in the image sequence, wherein one training sample with labels comprises the target area sample, the identification image sample, the accuracy, the masks and position information of the masks.

In some embodiments, the performing a mask extraction process on the third feature map to obtain a mask corresponding to the target region in the image includes: performing convolution processing on the third feature map through a second convolution layer in the neural network model to obtain a plurality of masks; performing convolution processing on the third characteristic diagram through a third convolution layer in the neural network model to obtain the accuracy of the corresponding mask; and determining the mask corresponding to the maximum accuracy as the mask of the corresponding target area in the image.

Wherein the parameters of the second convolutional layer are different from the parameters of the third convolutional layer. After the third feature map is obtained, performing convolution processing on the third feature map through a second convolution layer in the neural network model to obtain a plurality of feature maps corresponding to the masks, and performing nonlinear mapping on the feature maps corresponding to the masks to obtain a plurality of masks; and carrying out convolution processing on the third characteristic graph through a third convolution layer in the neural network model to obtain the accuracy of the corresponding mask, and determining the mask corresponding to the maximum accuracy as the mask of the corresponding target area in the image.

In some embodiments, before determining the mask corresponding to the maximum accuracy as the mask of the corresponding target region in the image, the method further comprises: performing convolution processing on the matched feature map through a fourth convolution layer in the neural network model to obtain position information and size information of the corresponding mask; determining the position information of the mask corresponding to the maximum accuracy as the position information of the mask corresponding to the target area in the image, and determining the size information of the mask corresponding to the maximum accuracy as the size information of the mask corresponding to the target area in the image;

the method further comprises the following steps: determining the position information of the reference area in the image according to the distance between the target area and the reference area and the position information of the mask corresponding to the target area in the image; multiplying the ratio of the size of the reference area to the size of the target area and the size information of the mask of the corresponding target area in the image, and determining the obtained size information as the size information of the reference area in the image; the reference area in the image is determined based on the position information and the size information of the reference area.

In order to obtain the determined mask position, the matching feature map may be convolved by a fourth convolution layer in the neural network model to obtain position information and size information corresponding to a plurality of masks, the position information of the mask corresponding to the maximum accuracy is determined as the position information of the mask corresponding to the target area in the image, and the size information of the mask corresponding to the maximum accuracy is determined as the size information of the mask corresponding to the target area in the image.

Since the image sequence is in dynamic change and the target area is in change, the position and size of the reference area in the image can be adjusted after the size information and the position information of the mask corresponding to the target area in the image are obtained. Acquiring the distance between a target area in a previous image and a reference area, or determining the distance between the target area in any one image and the reference area, and determining the position information of the reference area in the current image according to the distance between the target area and the reference area and the position information of a mask corresponding to the target area in the current image; and acquiring the ratio of the size of the reference area of the previous image to the size of the target area, or determining the ratio of the size of the reference area in any image to the size of the target area, multiplying the ratio of the size of the reference area to the size of the target area and the size information of the mask of the corresponding target area in the image, and determining the obtained size information as the size information of the reference area in the image. And determining the reference area in the image by combining the position information and the size information of the reference area, so that the determined reference area can be adjusted according to the dynamically changed target area in the following, and the target area and the reference area are accurately fused.

In step 103, the mask corresponding to the target area in each image is morphologically processed to obtain an adjusted mask for each image.

Because the determined mask may have a hole, morphological processing may be performed on the mask corresponding to the target region in each image, so that the hole is removed from the adjusted mask of each image, and it is avoided that some parts in the target region cannot be fused.

In some embodiments, performing morphological processing on the mask corresponding to the target region in each image to obtain an adjusted mask for each image includes: performing expansion processing on the mask corresponding to the target area in each image to obtain an adjusted mask with the cavity removed; or, performing expansion processing on the mask corresponding to the target area in each image to obtain a mask with the removed cavity, and performing corrosion processing on the mask with the removed cavity to obtain an adjusted mask.

The expansion processing can be carried out on the mask part of each image, the hollow holes of the mask part are removed, and the edge range of mask fusion can be enlarged. In addition, the mask part of each image can be expanded to remove the holes of the mask part and expand the edge range of mask fusion, and then the mask subjected to expansion processing is corroded to restore the range of mask fusion.

In step 104, the reference region and the adjusted mask of each image are fused to obtain fused images corresponding to the images one by one and from which the interference content is removed.

After the server obtains the adjusted mask, the reference region and the adjusted mask of each image can be fused to obtain fused images which correspond to the images one by one and are free of interference content, and therefore the automatic interference elimination function is achieved.

In some embodiments, the blending process of the reference region and the adjusted mask of each image includes: and performing fusion processing according with the following conditions on the edge information of the reference area, the part to be fused and the adjusted mask in the image: the edge information of the fusion part in the fusion image is consistent with the edge information of the reference area; and the difference value between the gradient of the fusion part in the fusion image and the gradient of the part to be fused in the reference region is smaller than a gradient difference threshold value.

After the reference areas in the images are obtained, extracting the edge information of the reference areas and the parts to be fused so as to carry out fusion processing with the adjusted masks in the images according to the edge information of the reference areas and the parts to be fused to obtain fused images, and enabling the edge information of the fused parts in the fused images to be consistent with the edge information of the reference areas; the difference between the gradient of the fused portion in the fused image and the gradient of the portion to be fused of the reference region is smaller than a gradient difference threshold (the difference between the gradient of the fused portion in the fused image and the gradient of the portion to be fused of the reference region is smallest).

In some embodiments, the method further comprises: and performing pixel point restoration processing on adjacent fusion images in the fusion image corresponding to each image based on the optical flow information of the corresponding target area in the fusion image corresponding to each image to obtain continuous fusion images.

After the fused image without the interference content is obtained, flicker may exist in the fused image, that is, the brightness difference of partial pixel points is too large. In order to solve the flicker problem, pixel point restoration processing can be performed on adjacent fused images in the fused image corresponding to each image according to the optical flow information of the corresponding target area in the fused image corresponding to each image, so that a continuous and flicker-free fused image is obtained.

In some embodiments, before performing pixel point repairing processing on an adjacent fused image in the fused image corresponding to each image, the method further includes: determining first optical flow information of a corresponding target area between adjacent images in each image; determining the first optical flow information as second optical flow information of a corresponding target area between adjacent fused images based on the fused images corresponding to the images;

carrying out pixel point repairing treatment on adjacent fusion images in the fusion images corresponding to the images to obtain continuous fusion images, wherein the method comprises the following steps: second optical flow information is blended into the pixel point information of the adjacent fusion images; and when the brightness difference value of the pixel point of the adjacent fusion image containing the optical flow information is larger than the brightness difference value threshold, performing brightness restoration processing on the pixel point of the adjacent fusion image containing the optical flow information to obtain a continuous fusion image.

Before the restoration processing is performed, first optical flow information corresponding to a target area between adjacent images in each image is required, and the first optical flow information is determined as second optical flow information corresponding to the target area between adjacent fused images based on the fused image corresponding to the image. After the second optical flow information is obtained, optical flow constraint is introduced into the adjacent fusion images, namely, the second optical flow information is fused into the pixel point information of the adjacent fusion images, when the brightness difference value of the pixel point of the adjacent fusion images containing the optical flow information is larger than a set brightness difference value threshold, the brightness restoration processing of the pixel point is carried out on the adjacent fusion images containing the optical flow information, and therefore continuous and flicker-free fusion images are obtained, wherein the continuity means that the continuity means the continuity in time and the sense of a user is uninterrupted, and the user cannot perceive obvious difference between the adjacent fusion images and flicker is avoided.

In some embodiments, performing luminance restoration processing on a pixel point on an adjacent fused image containing optical flow information to obtain a continuous fused image, includes: when the adjacent fused images containing the optical flow information are a first image and a second image, one of the following processes is executed for the second image: adding the brightness of the pixel point of the first image and the brightness difference value threshold to obtain the brightness of the pixel point, and determining the brightness of the pixel point of the second image after the brightness restoration processing; and determining the average value of the brightness of the pixel points of the first image and the second image before the brightness restoration processing as the brightness of the pixel points of the second image after the brightness restoration processing.

When the flicker is determined to exist, the brightness restoration processing of the pixel points needs to be performed on the adjacent fusion image containing the optical flow information. When the adjacent fusion images containing the optical flow information are a first image and a second image, the brightness of a pixel point obtained after adding the brightness of the pixel point of the first image and a brightness difference value threshold is determined as the brightness of the pixel point of the second image after brightness restoration processing; or determining the average value of the brightness of the pixel points of the first image and the second image before the brightness restoration processing as the brightness of the pixel points of the second image after the brightness restoration processing. The embodiment of the present invention is not limited to the above-described repair method, and other repair methods capable of removing flicker are also applicable.

The image descrambling processing method provided by the embodiment of the present invention has been described with reference to the exemplary application and implementation of the server provided by the embodiment of the present invention, and the following continues to describe the scheme for implementing image descrambling processing by matching the modules in the image descrambling processing device 555 provided by the embodiment of the present invention.

A first determining module 5551, configured to determine a target region where an interference content is located in any image of an image sequence, and determine a reference region where a material content is located in the any image, where the material content is used to replace the interference content;

a first processing module 5552, configured to perform matching processing on each image in the image sequence based on the target region, so as to obtain a mask corresponding to the target region in each image;

a second processing module 5553, configured to perform morphological processing on the mask corresponding to the target area in each image to obtain an adjusted mask of each image;

a fusion module 5554, configured to perform fusion processing on the reference region and the adjusted mask of each image to obtain a fused image that corresponds to each image one to one and is obtained by removing the interference content.

In the above technical solution, the first determining module 5551 is further configured to perform image segmentation processing on any image of the image sequence to obtain an image region to be matched; matching the image area to be matched based on the interference content data set, and determining the image area to be matched as a target area where interference content is located in any one image when matching is successful; determining edge features of the target area; in any one of the images, a region having an edge feature that meets a similarity condition with an edge feature of the target region is determined as a reference region, and the content in the determined reference region is determined as material content.

In the above technical solution, the first determining module 5551 is further configured to determine any image in the image sequence; responding to the region selection operation aiming at any image, and determining a domain region set by the region selection operation as a target region where the interference content is located; and responding to the material selecting operation aiming at any image, and determining the area which is set by the material selecting operation and comprises the material content as a reference area.

In the above technical solution, the first processing module 5552 is further configured to execute the following processing on any image in the image sequence: performing convolution processing on the target area and the image respectively through a first convolution layer in a neural network model to obtain a first feature map corresponding to the target area and a second feature map corresponding to the image; performing convolution processing on the first characteristic diagram and the second characteristic diagram to obtain a third characteristic diagram; and performing mask extraction processing on the third feature map to obtain a mask corresponding to the target area in the image.

In the above technical solution, the first processing module 5552 is further configured to perform convolution processing on the third feature map through a second convolution layer in the neural network model to obtain a plurality of masks; performing convolution processing on the third feature map through a third convolution layer in the neural network model to obtain the accuracy corresponding to the mask; determining a mask corresponding to the maximum accuracy as a mask corresponding to the target area in the image; wherein parameters of the second convolutional layer are different from parameters of the third convolutional layer.

In the above technical solution, the image descrambling processing device 555 further includes:

a second determining module 5555, configured to perform convolution processing on the matching feature map through a fourth convolution layer in the neural network model to obtain position information and size information corresponding to the mask; determining the position information of the mask corresponding to the maximum accuracy as the position information of the mask corresponding to the target area in the image, and determining the size information of the mask corresponding to the maximum accuracy as the size information of the mask corresponding to the target area in the image;

the image descrambling processing device 555 further comprises:

a third determining module 5556, configured to determine the position information of the reference region in the image according to the distance between the target region and the reference region and the position information of the mask corresponding to the target region in the image; multiplying the ratio of the size of the reference area to the size of the target area by the size information of a mask corresponding to the target area in the image, and determining the obtained size information as the size information of the reference area in the image; determining a reference region in the image based on the position information and the size information of the reference region.

In the above technical solution, the second processing module 5553 is further configured to perform expansion processing on the mask corresponding to the target area in each image to obtain an adjusted mask with a removed cavity; or performing expansion processing on the mask corresponding to the target area in each image to obtain a mask with a removed cavity, and performing corrosion processing on the mask with the removed cavity to obtain an adjusted mask.

In the above technical solution, the fusion module 5554 is further configured to perform fusion processing on the edge information and the part to be fused of the reference region and the adjusted mask in the image, where the fusion processing meets the following conditions: the edge information of the fusion part in the fusion image is consistent with the edge information of the reference area; and the difference value between the gradient of the fusion part in the fusion image and the gradient of the part to be fused of the reference region is smaller than a gradient difference threshold value.

a repairing module 5557, configured to perform pixel repairing processing on an adjacent fused image in the fused image corresponding to each image based on the optical flow information corresponding to the target area in the fused image corresponding to each image, so as to obtain a continuous fused image.

a fourth determining module 5558, configured to determine first optical flow information corresponding to the target area between adjacent images in the images; determining the first optical flow information as second optical flow information between the adjacent fused images corresponding to the target area based on a fused image corresponding to the image;

the restoration module 5557 is further configured to blend the second optical flow information into the pixel point information of the adjacent blended image; and when the brightness difference value of the pixel point of the adjacent fusion image containing the optical flow information is larger than the brightness difference value threshold, performing brightness restoration processing on the pixel point of the adjacent fusion image containing the optical flow information to obtain a continuous fusion image.

In the above technical solution, the repairing module 5557 is further configured to, when the adjacent fused image including the optical flow information is a first image and a second image, perform one of the following processes on the second image: adding the brightness of the pixel point of the first image and the brightness difference value threshold to obtain the brightness of the pixel point, and determining the brightness of the pixel point of the second image after brightness restoration processing; and determining the average value of the brightness of the pixel points of the first image and the second image before the brightness restoration processing as the brightness of the pixel points of the second image after the brightness restoration processing.

The following describes an image descrambling processing method provided by the embodiment of the present invention with reference to an exemplary application and implementation of the terminal provided by the embodiment of the present invention. Referring to fig. 4, fig. 4 is a schematic structural diagram of an image descrambling processing device 600 according to an embodiment of the present invention, and the image descrambling processing device 600 shown in fig. 4 includes: at least one processor 610, memory 650, at least one network interface 620, and a user interface 630. The functions of the processor 610, the memory 650, the at least one network interface 620, and the user interface 630 are similar to the functions of the processor 510, the memory 550, the at least one network interface 520, and the user interface 530, respectively, that is, the functions of the output device 631 and the input device 632 are similar to the functions of the output device 531 and the input device 532, and the functions of the operating system 651, the network communication module 652, the display module 653, and the input processing module 654 are similar to the functions of the operating system 551, the network communication module 552, the display module 553, and the input processing module 554, respectively, which are not described in detail.

In other embodiments, the image descrambling processing device provided by the embodiment of the present invention may be implemented by software, and fig. 4 shows the image descrambling processing device 655 stored in the memory 650, which may be software in the form of programs, plug-ins, etc., and includes a series of modules including a first rendering module 6551, a third processing module 6552, and a second rendering module 6553; the first rendering module 6551, the third processing module 6552 and the second rendering module 6553 are used to implement the image descrambling processing method provided by the embodiment of the present invention.

The following describes an image descrambling processing method provided by the embodiment of the present invention with reference to an exemplary application and implementation of the terminal provided by the embodiment of the present invention. Referring to fig. 5, fig. 5 is a schematic flowchart of an image descrambling processing method according to an embodiment of the present invention, which is described with reference to the steps shown in fig. 5.

In step 201, the image sequence is presented in the client, and in any image of the image sequence, a bounding box of a target area where the interference content is located is presented, and a bounding box of a reference area where the material content is located is presented for the material content used to replace the interference content.

After the user inputs the image sequence on the client, the client can present the image sequence, present a bounding box of a target area where the interference content is located in any image of the image sequence, and present a bounding box of a reference area where the material content for replacing the interference content is located. The image may be the first image in the image sequence, the middle image in the image sequence, or the last image in the image sequence. The bounding box is used to prompt the user for the range of the target area and the reference area.

In some embodiments, presenting, in any one of the images of the sequence of images, a bounding box of the target region in which the interfering content is located, comprises: in response to a region selection operation for any one of the images of the image sequence, presenting a bounding box of a target region in which the selected interference content is located in any one of the images;

an enclosure of a reference region in which material content is presented, comprising: in response to a selection operation for the material in any one of the images, a bounding box of a reference area in which the content of the selected material is located is presented in any one of the images.

The user can input the image sequence on the terminal, and simultaneously manually select a target area and a reference area in any image in the image sequence, and the client responds to the user operation and presents the target area and the range of the reference area on the client.

In some embodiments, presenting an enclosure of a target area in which the interfering content is located includes: responding to a region identification request aiming at the image sequence, and carrying out image segmentation processing on any image of the image sequence to obtain an image region to be matched; matching the image area to be matched based on the interference content data set, and determining the image area to be matched as a target area where the interference content is located in any image when the matching is successful; presenting an enclosure of a target area in which the interfering content is located;

an enclosure of a reference region in which material content is presented, comprising: determining an area with edge characteristics meeting similar conditions with the edge characteristics of the target area as a reference area, and determining the content in the determined reference area as material content; a bounding box of the reference region in which the material content is located is presented.

After the user inputs the image sequence, the client responds to the region identification request aiming at the image sequence, the target region and the reference region in any image in the image sequence can be automatically identified, and manual operation of the user is reduced.

In step 202, in response to the descrambling operation for the target region, each image in the image sequence is subjected to recognition processing based on the target region and the reference region in any one of the images, and the target region and the reference region in each image are obtained.

The user can click a descrambling button presented on the client, and the client responds to descrambling operation aiming at the target area, and performs recognition processing on each image (except the image presenting the surrounding frame) in the image sequence based on the target area and the reference area in any image to obtain the target area and the reference area in each image.

In some embodiments, identifying each image in the image sequence to obtain the target region and the reference region in each image includes: matching each image in the image sequence based on the target area to obtain a mask corresponding to the target area in each image, position information and size information of the mask; determining the position information of the reference area in the image according to the distance between the target area and the reference area and the position information of the mask corresponding to the target area in the image; multiplying the ratio of the size of the reference area to the size of the target area and the size information of the mask of the corresponding target area in the image, and determining the obtained size information as the size information of the reference area in the image; the reference area in the image is determined based on the position information and the size information of the reference area.

In step 203, the target region and the reference region in each image are fused to obtain a fused image that corresponds to each image one by one and from which the interference content is removed.

In some embodiments, before performing fusion processing on the target region and the reference region in each image to obtain a fused image that corresponds to each image one-to-one and has interference content removed, the method further includes: performing morphological processing on the mask corresponding to the target area in each image to obtain the adjusted mask of each image;

the method further comprises the following steps: and performing pixel point restoration processing on adjacent fusion images in the fusion image corresponding to each image based on the optical flow information of the corresponding target area in the fusion image corresponding to each image to obtain continuous fusion images.

In step 204, the fused images in one-to-one correspondence are presented at the client.

Now, the image descrambling processing method provided by the embodiment of the present invention has been described, and the following continues to describe a scheme for implementing image descrambling processing by matching the modules in the image descrambling processing device 655 provided by the embodiment of the present invention.

A first presenting module 6551, configured to present, in the client, an image sequence, and present, in any image of the image sequence, a bounding box of a target region where the interfering content is located, and present, for a material content used to replace the interfering content, a bounding box of a reference region where the material content is located;

a third processing module 6552, configured to perform recognition processing on each image in the image sequence based on a target region and a reference region in any one of the images in response to a descrambling operation for the target region, so as to obtain the target region and the reference region in each image; performing fusion processing on the target area and the reference area in each image to obtain a fused image which corresponds to each image one by one and is free of the interference content;

a second presenting module 6553, configured to present the fused image corresponding to one at the client.

In the above technical solution, the first presenting module 6551 is further configured to respond to a region selecting operation for any image of the image sequence, and present a bounding box of a target region where the selected interference content is located in the any image; and in response to the selection operation of the material in any image, presenting a surrounding frame of a reference area where the content of the selected material is located in any image.

In the above technical solution, the first presenting module 6551 is further configured to perform image segmentation processing on any image of the image sequence in response to a region identification request for the image sequence, so as to obtain an image region to be matched; matching the image area to be matched based on the interference content data set, and determining the image area to be matched as a target area where interference content is located in any one image when matching is successful; presenting an enclosure of a target area in which the interfering content is located; determining an area with edge features meeting similar conditions with the edge features of the target area as a reference area, and determining the content in the determined reference area as material content; and presenting a surrounding frame of the reference area where the material content is positioned.

Here, it should be noted that: the above description related to the apparatus is similar to the above description of the method, and for the technical details not disclosed in the apparatus according to the embodiment of the present invention, please refer to the description of the method embodiment of the present invention.

Embodiments of the present invention also provide a computer-readable storage medium storing executable instructions, which, when executed by a processor, will cause the processor to execute an image descrambling processing method provided by embodiments of the present invention, for example, the image descrambling processing method shown in fig. 3A-3B or the image descrambling processing method shown in fig. 5. It will be understood that executable instructions may also be stored in a blockchain.

In some embodiments, the storage medium may be a memory such as FRAM, ROM, PROM, EPROM, EE PROM, flash, magnetic surface memory, optical disk, or CD-ROM; or may be various devices including one or any combination of the above memories.

In some embodiments, executable instructions may be written in any form of programming language (including compiled or interpreted languages), in the form of programs, software modules, scripts or code, and may be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.

By way of example, executable instructions may correspond, but do not necessarily have to correspond, to files in a file system, and may be stored in a portion of a file that holds other programs or data, such as in one or more scripts in a hypertext Markup Language (H TML) document, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code).

As an example, executable instructions may be deployed to be executed on one computing device or on multiple computing devices at one site or distributed across multiple sites and interconnected by a communication network, which may comprise a block chain system.

In the following, an exemplary application of the embodiments of the present invention in a practical application scenario will be described.

A certain television program is to shoot a certain program, a video is shot, and then a certain person in the video needs to be removed for some reasons. In order to remove a person in a video, the video is mainly processed by human labor frame by frame, and a lot of time and labor are spent on repairing the video frame by frame. However, since the video is generally 25-30 frames per second, at least 250 frames of images need to be processed for 10 seconds, and manual frame-by-frame processing is adopted, so that the workload is extremely high, the efficiency is extremely low, and the actual application scene cannot be met.

The embodiment of the invention provides an image interference elimination processing method, which can extract a mask in each image frame by frame through a mask-based neural network model (target tracking network), select a similar region as a replacement image part, namely a reference region for each mask, perform Poisson fusion processing on the mask and the reference region, and ensure the continuity of a video through optical flow constraint, so that the image after picture fusion is more vivid, and the purpose of finally removing impurities (interference content) in the video is achieved, thereby saving time and labor, and ensuring the real and reliable output effect.

As shown in fig. 6, fig. 6 is a schematic view of a selection interface of a target area and a reference area provided in an embodiment of the present invention, a first frame image of a video may be presented on a display interface of a terminal, or other frames of the video may be presented, a user may select a target area 602 (enclosed by a rectangular frame) from which debris (e.g., garbage) needs to be removed after clicking a "select target area" button 601 on the image, the user may select a reference area 604 (enclosed by a rectangular frame) for replacing debris in a screen after clicking a "select reference area" button 603 on the image, and finally, the user clicks a "start repairing" button 605, and the terminal performs a series of processing (e.g., matching, morphology, fusion, etc.) on the video based on the target area and the reference area for selection in response to the click operation of the user, so as to remove debris in the video. As shown in fig. 7, fig. 7 is a schematic view of an interface of a video with sundries provided by an embodiment of the present invention, and fig. 7 is a schematic view showing that interference content (sundries) 701 exists in the video, and an area where the interference content is located is a target area, and after an image descrambling processing method is adopted, the sundries existing in fig. 7 can be removed. As shown in fig. 8, fig. 8 is a schematic interface diagram for removing sundries in a video according to an embodiment of the present invention, and each frame in the video does not have the interference content 701 in fig. 7, that is, a function of automatically removing sundries is implemented.

As shown in fig. 9, fig. 9 is a flowchart illustrating an image descrambling processing method according to an embodiment of the present invention, and in step S1, a user imports a video; in step S2, the user selects a target area, and any image in the video (which may be the first image in the video, an image in the middle of the video, or the last image in the video) may be presented for the user to select the target area; in step S3, the user selects a reference area, and the user can select the reference area in any one of the presented images; in step S4, the mask in each image (excluding the images already presented in steps S2-S3) is extracted by the target tracking model; in step S5, the target area is compared, and reference area adjustment is performed; in step S6, morphological processing is performed on the mask portion; in step S7, poisson fusion is performed on the mask and the reference region; in step S8, the optical flow constraint solves the inter-frame discontinuity problem; in step S9, a descrambled video is generated. The steps S1-S3 are processes in which the user manually selects the target region and the reference region, and the steps S1-S3 may also be replaced by processes in which the target region and the reference region are automatically selected, and image segmentation processing is performed on any image of the video to obtain an image region to be matched; matching the image area to be matched based on the interference content (sundry) data set, and determining the image area to be matched as a target area when the matching is successful; determining the edge characteristics of the target area; in any image, the area with the edge characteristics meeting the similar conditions with the edge characteristics of the target area is determined as the reference area, so that the target area and the reference area are automatically selected without manual selection of a user.

Steps S4-S8 are explained below, respectively:

step S4: extracting masks in images through target tracking model

The embodiment of the invention can adopt a SimMask network model, wherein the SimMask network model is a target tracking network model based on Simese, and different from the traditional model, the model can extract the mask part of the target, but the traditional model can only extract the rectangular frame of the target, and can not extract the outline information of the target for non-rectangular objects. The SiamMask network model is a model framework for unifying visual target tracking (VOT) and video target segmentation (VOS), and the SiamMask network model can obtain the mask of the corresponding target area in each image only by initializing the target of the video tracking, namely the target area. Wherein, the SiamMask network model is a neural network model.

As shown in fig. 10, fig. 10 is a schematic diagram of a SiamMask network model provided in an embodiment of the present invention, after a target region is determined, the target region and a video may be input into the SiamMask network model, the SiamMask network model performs recognition processing on each image in the video according to the target region, and a specific recognition process for any image in the video is as follows: 1) first convolution layer (f) through the SiamMask network model_θShowing the first winding layerParameters of (3) respectively convolving the target region (e.g., 127 × 3, 3 may represent RG B) and the image (e.g., 255 × 3) to obtain a first feature map (e.g., 15 × 256) corresponding to the target region and a second feature map (31 × 256) corresponding to the image, respectively; 2) performing convolution processing on the first feature map and the second feature map (d represents convolution operation) to obtain a third feature map (for example, 31 x 256, 256 represents the number of channels, wherein 1 x 256 represents one unit in the feature map); 3) and performing mask extraction processing on the third characteristic image through a SimMask network model to obtain a mask corresponding to a target area in the image, wherein the specific mask extraction processing is as follows: by a second convolution layer (

Parameters representing the second convolution layer) to obtain a plurality of feature maps (e.g., 12 × 12 (63 × 63), where 1 × 1 (63 × 63) is a unit in the feature map), to perform nonlinear mapping on the feature maps to obtain a plurality of corresponding masks, and to pass through the third convolution layer (63)

Parameters representing the third convolution layer) to obtain the accuracy of the corresponding mask (e.g., 17 x 2k), and passing through the fourth convolution layer (h)_σParameters representing the fourth convolution layer) to obtain position information and size information (for example, 17 × 4k) of the corresponding mask, wherein the position of the mask in the image can be represented by a large frame, and the large frame has no inclination and is used for surrounding the mask; 4) and determining the mask corresponding to the maximum accuracy as the mask corresponding to the target area in the image, and determining the position information and the position information of the mask corresponding to the maximum accuracy as the position information and the position information of the mask corresponding to the target area in the image. Wherein, the parameters of the first convolution layer, the second convolution layer, the third convolution layer and the fourth convolution layer are different. In the process of training the SiamMask network model, the S iamMask network model is trained through a training sample set with labels, so that the trained SiamMask network model can be usedAccording to the input target area and the video, each image in the video is identified to obtain a mask in each image in the video, wherein one training sample with a label comprises a target area sample, an identified image sample, accuracy, the mask, position information of the mask and the position information of the mask.

For a video imported by a user, the user can select an identified target area in a first frame of the video and send the identified target area into a trained SiamMask network model, the SiamMask network model reads the content of each frame of the video and outputs a small frame (box) and a mask (mask) of each frame corresponding to the target area, and the small frame is a minimum frame (which can have inclination) wrapping the mask. As shown in fig. 11A-11B, fig. 11A-11B are schematic diagrams illustrating comparison of the output effect of the SiamMask network model according to the embodiment of the present invention, where the image shown in fig. 11A is any original image in a video that has not been processed by the SiamMask network model, fig. 11A outputs fig. 11B after being processed by the SiamMask network model, and fig. 11B shows a mask and a small border 1101 of a corresponding target area identified by the SiamMask network model.

Step S5: comparing the target area and adjusting the reference area

The step S4 extracts the mask in each frame of the video, so as to obtain the mask picture of each frame of the video, and since the picture is in a motion state, the mask will also change in size, shape, etc., the reference area of each frame of the video can be correspondingly adjusted according to the change of the mask. As shown in fig. 12A-12B, fig. 12A-12B are schematic diagrams illustrating comparison of adjustment effects of reference regions according to embodiments of the present invention, where the image shown in fig. 12A is the first frame image in a video, and the image shown in fig. 12B is the fifteenth frame image in the video, and it can be seen that the size and the position of the reference region 1201 change correspondingly with the size and the position of the target region 1202.

The reference region adjustment function may be provided to the user in an optional manner, as shown in fig. 13, fig. 13 is a schematic diagram of user selection provided in the embodiment of the present invention, the user may always select the reference region of the first frame as the reference region of each frame later, and the user may also select the reference region to perform corresponding adjustment according to the mask portion.

The following describes the case where the reference region is adjusted according to the mask portion:

taking the center of the rectangle of the reference area as an example, the position calculation formula of the reference area is shown in formula (1):

wherein, the outline of the mask region is subjected to rectangular operation,

indicating the position of the center of the rectangle of the mask of the ith frame,

indicates the position of the center of the rectangle of the reference area of the i-th frame,

indicates the position of the center of the rectangle of the mask of the (i + 1) th frame,

indicating the position of the center of the rectangle of the reference area of the i-th frame.

The size calculation formula of the reference region is shown in formulas (2) and (3):

wherein, in the formula (2),

the width of the center of the rectangle representing the mask of the ith frame,

indicates the width of the center of the rectangle of the mask of the (i + 1) th frame,

indicates the width of the center of the rectangle of the reference area of the i-th frame,

indicates the width of the center of the rectangle of the reference region of the (i + 1) th frame. In the formula (3), the first and second groups,

indicating the height of the center of the rectangle of the mask of the ith frame,

represents the height of the center of the rectangle of the mask of the (i + 1) th frame,

indicates the height of the center of the rectangle of the reference area of the ith frame,

the height of the center of the rectangle of the reference region of the (i + 1) th frame is indicated.

Step S6: morphological processing of the mask portion

In order to make the effect of the subsequent poisson fusion more realistic, the mask portion may be morphologically processed. Specifically, the expansion processing can be performed on the mask portion of each image, the holes of the mask portion can be removed, and the edge range of mask fusion can be expanded. Fig. 14A-14B are schematic diagrams comparing the morphological processing effects provided by the embodiment of the invention, in which fig. 14A shows the mask portion without the morphological processing, the mask portion has a hole 1401, fig. 14B shows the mask portion with the morphological processing, the mask portion has no hole, and the range of the mask is enlarged.

In addition, the mask part of each image can be expanded to remove the holes of the mask part and expand the edge range of mask fusion, and then the mask subjected to expansion processing is corroded to restore the range of mask fusion.

Step S7: poisson fusion of mask and reference region

The calculation formula of poisson fusion is shown as formula (4):

wherein if the reference region is fused on the target region, f represents the fused image after fusion, f represents the target region, v represents the gradient of the reference region, f represents the first order gradient of f (i.e., the gradient of the fused image), represents the part to be fused,

representing the edge portion of omega. Therefore, in the case where the edge of the target region is constant, the fused image of the fused portion is determined such that the gradient of the fused image at the fused portion is closest to the gradient of the reference region at the fused portion.

And replacing the image of the reference area in each frame of the video in the target area of the picture of the corresponding frame, wherein the replacement position is the position of the mask expanded in the step S6, the center of the mask is the fusion center, and the range of the placement position corresponds to the mask part. The fusion process can change the color and the gradient in the reference region image, thereby achieving the effect of seamless fusion. Taking one frame of fusion effect as an example, as shown in fig. 15A-15B, fig. 15A-15B are schematic diagrams illustrating comparison of fusion effects provided by an embodiment of the present invention, where the image shown in fig. 15A is an image that has not undergone fusion processing, fig. 15A includes sundries in a target area, and the image shown in fig. 15B is an image that has undergone fusion processing, and the reference area and the target area are fused to remove the sundries in fig. 15A, and fig. 15B has no sundries.

Step S8: optical flow constraint to solve inter-frame discontinuity problem

Optical flow is the instantaneous velocity of pixel motion of a spatially moving object in the viewing plane (the retina through which continuously changing information constantly "flows"). In the video shot by the camera, the frames are time-sequential, and the optical flow, namely the motion information of the object, can be calculated from the adjacent two frames. Therefore, the change of the pixel points in the adjacent frames can be restrained through the optical flow information, and the flicker possibly existing between the adjacent frames (shown as the brightness inconsistency of the individual pixel points) is processed.

Calculating the optical flow between adjacent frames of the video, extracting only the optical flow information of the target area (i.e. the motion information of each pixel point of the sundries), adding optical flow constraint (optical flow information) to the mask area of the fused image generated in step S7, preventing the situation that the brightness of the pixel points between the generated fused video frames is changed too much, and performing restoration processing on the pixel points with large changes, so as to achieve the overall coordination and consistency between the video frames. The brightness of the next frame image (second image) is restored according to the previous frame image (first image) and the optical flow information, and the calculation formula is shown as formula (5):

next(y+flow(y,x)[1],x+flow(y,x)[0])～prev(y,x) (5)

wherein x and y respectively represent horizontal and vertical coordinates of an image, flow represents extracted optical flow information, flow (y, x) 1 represents optical flow information in a y direction, flow (y, x) 0 represents optical flow information in an x direction, next represents luminance of a next frame in adjacent frames of a video, prev represents luminance of a previous frame in adjacent frames of the video, and-represents approximation. And when the brightness difference value of the pixel point of the adjacent fusion image is larger than the brightness difference value threshold, performing brightness restoration processing on the pixel point of the adjacent fusion image to obtain a continuous fusion image. Adding the brightness of the pixel point of the previous frame image and the brightness difference value threshold value to obtain the brightness of the pixel point, and determining the brightness of the pixel point of the next frame image after the brightness restoration processing; or, determining the average value of the brightness of the pixel points of the previous frame image and the next frame image before the brightness restoration processing as the brightness of the pixel points of the next frame image after the brightness restoration processing.

Therefore, the problem of flicker possibly existing between frames is solved through optical flow constraint, and continuous video with sundries removed is obtained.

In summary, in the embodiment of the present invention, the mask portion of each frame of the video is tracked and identified, the reference area is selected as the replacement image, the poisson fusion is used to achieve the real fusion effect, and the optical flow motion is used to constrain the video frames to achieve the overall continuous and consistent effect, so that the function of removing impurities or replacing targets in the video is achieved, the time of manual frame-by-frame processing is greatly reduced, and the generated video effect is real.

The above description is only an example of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, and improvement made within the spirit and scope of the present invention are included in the protection scope of the present invention.

Claims

1. An image descrambling processing method, the method comprising:

performing fusion processing on the reference region and the adjusted masks of the images to obtain fused images which correspond to the images one by one and are removed of the interference content;

determining first optical flow information corresponding to the target area between adjacent images in each image;

determining the first optical flow information as second optical flow information corresponding to the target area between adjacent fused images based on a fused image corresponding to the image;

the second optical flow information is blended into the pixel point information of the adjacent fusion image;

when the adjacent fused image containing the second optical flow information is a first image and a second image, and the brightness difference value of the first image and the second image is greater than the brightness difference value threshold, executing one of the following processes on the second image:

2. The method of claim 1,

the determining a target area where the interference content is located in any one of the images of the image sequence includes:

carrying out image segmentation processing on any image of the image sequence to obtain an image area to be matched;

the determining the reference area where the material content is located in any one of the images includes:

determining edge features of the target area;

3. The method of claim 1,

responding to the region selection operation aiming at any image, and determining a domain region set by the region selection operation as a target region where the interference content is located;

4. The method according to claim 1, wherein the matching each image in the image sequence based on the target region to obtain a mask corresponding to the target region in each image comprises:

for any image in the image sequence, performing the following:

5. The method according to claim 4, wherein the performing a mask extraction process on the third feature map to obtain a mask corresponding to the target region in the image comprises:

performing convolution processing on the third feature map to obtain a plurality of masks and the accuracy corresponding to the masks;

6. The method of claim 5,

before the determining the maximum accuracy corresponding mask as the mask in the image corresponding to the target region, the method further comprises:

performing convolution processing on the third feature map to obtain position information and size information corresponding to the mask;

before performing morphological processing on the mask corresponding to the target region in each image to obtain the adjusted mask of each image, the method further includes:

determining the position information of a reference area in the image according to the distance between the target area and the reference area and the position information of a mask corresponding to the target area in the image;

7. The method of claim 1, wherein the performing morphological processing on the mask corresponding to the target region in each image to obtain an adjusted mask for each image comprises:

performing expansion processing on the mask corresponding to the target area in each image to obtain an adjusted mask with the cavity removed; alternatively, the first and second electrodes may be,

8. The method according to claim 1, wherein the fusing the reference region and the adjusted mask of each image comprises:

and performing fusion processing according with the following conditions on the edge information of the reference area, the part to be fused and the adjusted mask in the image: the edge information of the fusion part in the fusion image is consistent with the edge information of the reference area;

9. An image descrambling processing method, the method comprising:

presenting the image sequence in the client, an

determining the average value of the brightness of the pixel points of the first image and the second image before the brightness restoration processing as the brightness of the pixel points of the second image after the brightness restoration processing;

and presenting the one-to-one corresponding second image after the brightness restoration processing at the client.

10. An image descrambling processing device, the device comprising:

the first processing module is used for performing matching processing on each image in the image sequence based on the target area to obtain a mask corresponding to the target area in each image;

the fusion module is used for carrying out fusion processing on the reference region and the adjusted masks of the images to obtain fusion images which correspond to the images one by one and remove the interference content;

the restoration module is used for determining first optical flow information corresponding to the target area between adjacent images in each image; determining the first optical flow information as second optical flow information corresponding to the target area between adjacent fused images based on a fused image corresponding to the image; the second optical flow information is blended into the pixel point information of the adjacent fusion image; when the adjacent fused image containing the second optical flow information is a first image and a second image, and the brightness difference value of the first image and the second image is greater than the brightness difference value threshold, executing one of the following processes on the second image: adding the brightness of the pixel point of the first image and the brightness difference value threshold to obtain the brightness of the pixel point, and determining the brightness of the pixel point of the second image after brightness restoration processing; and determining the average value of the brightness of the pixel points of the first image and the second image before the brightness restoration processing as the brightness of the pixel points of the second image after the brightness restoration processing.

11. An image descrambling processing device, the device comprising:

a first rendering module for rendering the sequence of images in the client and

a third processing module, configured to perform recognition processing on each image in the image sequence based on a target region and a reference region in any one of the images in response to a descrambling operation for the target region, so as to obtain the target region and the reference region in each image; performing fusion processing on the target area and the reference area in each image to obtain a fused image which corresponds to each image one by one and is free of the interference content; determining first optical flow information corresponding to the target area between adjacent images in each image; determining the first optical flow information as second optical flow information corresponding to the target area between adjacent fused images based on a fused image corresponding to the image; the second optical flow information is blended into the pixel point information of the adjacent fusion image; when the adjacent fused image containing the second optical flow information is a first image and a second image, and the brightness difference value of the first image and the second image is greater than the brightness difference value threshold, executing one of the following processes on the second image: adding the brightness of the pixel point of the first image and the brightness difference value threshold to obtain the brightness of the pixel point, and determining the brightness of the pixel point of the second image after brightness restoration processing; determining the average value of the brightness of the pixel points of the first image and the second image before the brightness restoration processing as the brightness of the pixel points of the second image after the brightness restoration processing;

and the second presentation module is used for presenting the one-to-one corresponding second image after the brightness restoration processing at the client.

12. An image descrambling processing device, characterized in that the device comprises:

a memory for storing executable instructions;

a processor for implementing the image descrambling processing method according to any one of claims 1 to 9 when executing the executable instructions stored in the memory.

13. A computer-readable storage medium having stored thereon executable instructions for causing a processor to perform the image descrambling processing method according to any one of claims 1 to 9 when executed.