CN116437123A - Image processing method, related device and computer readable storage medium - Google Patents

Image processing method, related device and computer readable storage medium Download PDF

Info

Publication number
CN116437123A
CN116437123A CN202310311877.1A CN202310311877A CN116437123A CN 116437123 A CN116437123 A CN 116437123A CN 202310311877 A CN202310311877 A CN 202310311877A CN 116437123 A CN116437123 A CN 116437123A
Authority
CN
China
Prior art keywords
image
matting
images
video
frames
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310311877.1A
Other languages
Chinese (zh)
Inventor
陈信宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Wondershare Software Co Ltd
Original Assignee
Shenzhen Wondershare Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Wondershare Software Co Ltd filed Critical Shenzhen Wondershare Software Co Ltd
Priority to CN202310311877.1A priority Critical patent/CN116437123A/en
Publication of CN116437123A publication Critical patent/CN116437123A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/262Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
    • H04N5/265Mixing

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Signal Processing (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The embodiment of the application discloses an image processing method, related equipment and a computer readable storage medium, wherein the method can comprise the following steps: acquiring a first video code stream, wherein the video code stream comprises a group of N frames of matted images with adjacent time sequences; the N frames of the matting images comprise current frames of matting images; wherein N is a positive integer greater than 0; fusing a group of N frames of adjacent time sequence image matting images to obtain fused image matting images; replacing the next frame of the keying image adjacent to the current frame of the keying image with the fused keying image to form a next group of N frames of adjacent time sequences of keying images, wherein the current frame of keying image in the next group of N frames of adjacent time sequences of keying images is updated to be the fused keying image; and obtaining a second video code stream based on the respective fused keying images of each group of keying images. By implementing the method and the device, the stability of the image matting video can be improved.

Description

Image processing method, related device and computer readable storage medium
Technical Field
The present disclosure relates to the field of image processing technologies, and in particular, to an image processing method, a related device, and a computer readable storage medium.
Background
The video matting processing refers to the operation of separating the foreground and the background in the video image, and belongs to the inverse process of image synthesis. The formula of the image synthesis is as follows:
I i =a i F i +(1-a i )B i
wherein F is i The color of the foreground pixel point; a, a i The transparency of the foreground pixel points is the represented foreground duty ratio; b is the color of the background pixel point; i i The color of the synthesized image pixel point. Where i is the pixel weave, a i Greater than 0 and less than 1.
For the video to be subjected to the matting processing (such as a green curtain video), the color I of the synthesized image pixel point i Color B of background pixel point i Transparency a of foreground pixel point as known element i Color F of foreground pixel point i As the unknown element, the image synthesis formula has two unknowns, and cannot be directly solved, and only approximate estimation can be performed by using known information.
The applicant found in the research that in the video matting, if only the technology of a single Zhang Kou image is used for processing, the problem of flickering of the matting result can be easily encountered, and the method can be used for bringing a worse visual experience to the user as long as one frame of video image is not scratched. Therefore, how to improve the stability of the matting video is a technical problem to be solved.
Disclosure of Invention
The embodiment of the application provides an image processing method, related equipment and a computer readable storage medium, which can improve the stability of a matting video.
In a first aspect, an embodiment of the present application provides an image processing method, including:
acquiring a first video code stream, wherein the video code stream comprises a group of N frames of matted images with adjacent time sequences; the N frames of the image matting images comprise current frames of the image matting images; wherein N is a positive integer greater than 0;
fusing the N frames of adjacent time sequences to obtain fused keying images;
replacing the next frame of the matting image adjacent to the current frame of the matting image with the fused matting image to form a next group of N frames of adjacent time sequence matting images, wherein the current frame of the matting image in the next group of N frames of adjacent time sequence matting images is updated to the fused matting image;
and obtaining a second video code stream based on the respective fused keying images of each group of keying images.
According to the embodiment of the application, in a group of N frame images adjacent in time sequence, the N frame image matting images are fused to obtain fused image matting images, the next frame image matting image adjacent to the current frame image matting image in the N frame image matting images is replaced by the fused image matting images to form a new group of N frame image matting images adjacent in time sequence, fusion processing is conducted on the new group of N frame image matting images adjacent in time sequence, the fused image matting images of the group of image matting images are obtained, and therefore the processed video code stream can be obtained based on the image matting images after fusion of each group of image matting images. Because the continuous change among the multi-frame images and the fusion degree among the multi-frame images are fully considered by the fused image matting images, the stability of the image matting video can be improved in this way, and the flicker phenomenon is avoided to the greatest extent. For the user, a better visual effect can be improved for the user.
In one possible implementation manner, the acquiring the first video code stream includes:
acquiring a keying video;
performing video decoding on the matted video to obtain a video picture frame of the matted video;
and acquiring the first video code stream in the video picture frame according to a set time interval or according to the picture content variation of the video picture frame.
In one possible implementation manner, the fusing the N frames of the matted images adjacent to each other in the set of time sequences to obtain a fused matted image includes:
sequentially acquiring image features corresponding to each frame of the matting images in the N frames of matting images;
and fusing the image features corresponding to each frame of the image matting images to obtain fusion features of the N frames of the image matting images, so as to obtain the fused image matting images based on the fusion features.
In one possible implementation, the method further includes:
and processing the second video code stream by using a Gaussian smoothing algorithm to obtain a third video code stream.
Because the characteristics of the front and back adjacent matting images are considered in the Gaussian smoothing algorithm, the stability of the matting video can be further improved in this way.
In one possible implementation, the N is 3, and the number of groups of the matted images is 2.
In this way, a more stable matting video can be acquired with a smaller calculation amount using a plurality of matting images.
In a second aspect, an embodiment of the present application provides an image processing apparatus, including:
the first acquisition unit is used for acquiring a first video code stream, wherein the video code stream comprises a group of N frames of matted images with adjacent time sequences; the N frames of the image matting images comprise current frames of the image matting images; wherein N is a positive integer greater than 0;
the image fusion unit is used for fusing the N frames of adjacent time sequence image matting images to obtain fused image matting images;
the image processing unit is used for replacing the next frame of the image matting image adjacent to the current frame of the image matting image to form a next group of N frames of adjacent time sequence image matting images, wherein the current frame of the image matting image in the next group of N frames of adjacent time sequence image matting images is updated to the fused image matting image;
the second acquisition unit is used for acquiring a second video code stream based on the respective fused matting images of each group of matting images.
In one possible implementation manner, the first obtaining unit is specifically configured to:
acquiring a keying video;
performing video decoding on the matted video to obtain a video picture frame of the matted video;
and acquiring the first video code stream in the video picture frame according to a set time interval or according to the picture content variation of the video picture frame.
In a possible implementation manner, the fused image unit is specifically configured to:
sequentially acquiring image features corresponding to each frame of the matting images in the N frames of matting images;
and fusing the image features corresponding to each frame of the image matting images to obtain fusion features of the N frames of the image matting images, so as to obtain the fused image matting images based on the fusion features.
In one possible implementation, the apparatus further includes:
and the third acquisition unit is used for processing the second video code stream by using a Gaussian smoothing algorithm to obtain a third video code stream.
In one possible implementation, the N is 3, and the number of groups of the matted images is 2.
In a third aspect, an embodiment of the present application provides an electronic device, including a processor and a memory, where the processor and the memory are connected to each other, where the memory is configured to store a computer program supporting the electronic device to perform the method described above, the computer program including program instructions, and the processor is configured to invoke the program instructions to perform the method of the first aspect described above.
In a fourth aspect, embodiments of the present application provide a computer readable storage medium storing a computer program comprising program instructions which, when executed by a processor, cause the processor to perform the method of the first aspect described above.
In a fifth aspect, embodiments of the present application also provide a computer program comprising program instructions which, when executed by a processor, cause the processor to perform the method of the first aspect described above.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below.
FIG. 1 is a schematic block diagram of a video encoding and decoding system provided by an embodiment of the present application;
fig. 2 is a schematic flow chart of a video encoding and decoding method according to an embodiment of the present application;
fig. 3a is a schematic flow chart of an image processing method according to an embodiment of the present application;
fig. 3b is a schematic diagram of an acquired matting image according to an embodiment of the present application;
fig. 3c is a schematic diagram of a process for performing image processing on a matted image according to an embodiment of the disclosure;
Fig. 3d is a schematic diagram of a process for performing image processing on a matted image according to an embodiment of the disclosure;
fig. 3e is a schematic diagram of a process for performing image processing on a matted image according to an embodiment of the disclosure;
fig. 3f is a schematic diagram of a second video code stream according to an embodiment of the present application;
FIG. 3g is a flowchart of another image processing method according to an embodiment of the present disclosure;
fig. 4 is a schematic structural view of an image processing apparatus according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of another electronic device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.
The terms "first," "second," "third," and "fourth" and the like in the description and in the claims of this application and in the drawings, are used for distinguishing between different objects and not for describing a particular sequential order. Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the present application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.
As used in this specification, the terms "component," "module," "system," and the like are intended to refer to a computer-related entity, either hardware, firmware, a combination of hardware and software, or software in execution. For example, a component may be, but is not limited to being, a process running on a processor, an object, an executable, a thread of execution, a program, and/or a computer. By way of illustration, both an application running on a computing device and the computing device can be a component. One or more components may reside within a process and/or thread of execution and a component may be localized on one computer and/or distributed between 2 or more computers. Furthermore, these components can execute from various computer readable media having various data structures stored thereon. The components may communicate by way of local and/or remote processes such as in accordance with a signal having one or more data packets (e.g., data from two components interacting with one another in a local system, distributed system, and/or across a network such as the internet with other systems by way of the signal).
Summary of the application
The video matting processing is the operation of separating the foreground and the background in the video image, and belongs to the inverse process of image synthesis. Along with the development of the matting technology, the matting speed and matting precision are higher and higher, and the requirements are thinner and thinner. The applicant finds that in the research, the existing image matting method processes a single frame of video image, and does not refer to the information of a previous frame of video image, so that the problem of flickering of a matting result is easily encountered in the implementation mode, and the method can be seen to flicker as long as one frame of video image is not scratched, and brings the visual experience of comparing the grooved cakes to a user.
Based on the above, the application provides an image processing method, related equipment and a computer readable storage medium, in the method, in a group of N frames of time sequence adjacent images, N frames of image matting images are fused to obtain fused image matting images, the fused image matting images replace the next frame of image matting images adjacent to the current frame of image matting images in the N frames of image matting images to form a new group of N frames of time sequence adjacent image matting images, fusion processing is carried out on the new group of N frames of time sequence adjacent image matting images again to obtain fused image matting images of the group of image matting images, and therefore a processed video code stream can be obtained based on the respective fused image matting images of each group of image matting images. Because the continuous change among the multi-frame images and the fusion degree among the multi-frame images are fully considered by the fused image matting images, the stability of the image matting video can be improved in this way, and the flicker phenomenon is avoided to the greatest extent. For the user, a better visual effect can be improved for the user.
Before describing the technical solutions of the embodiments of the present application, first, technical scenarios and related technical terms of the present application will be described with reference to the accompanying drawings.
The technical scheme of the embodiment is applied to the technical field of image processing, and mainly aims at image matting processing of a series of continuous frames in a video. Wherein the video may be understood as a number of frames of images (also referred to in the art as images) played in a certain order and frame rate. In processing a video stream, video encoding and video decoding are included.
Further, video coding is a process of performing a coding operation on each frame of image in video to obtain coding information of each frame of image. Video encoding is performed on the source side. Video decoding is a process of reconstructing each frame of image from the encoded information of each frame of image. Video decoding is performed on the destination side. The combination of video encoding operations and video decoding operations may be referred to as video encoding and decoding (encoding and decoding).
Existing video codecs operate in accordance with a video codec standard (e.g., the high efficiency video codec h.265 standard) and conform to the high efficiency video codec standard (highefficiencyvideocoding standard, HEVC) test model. Alternatively, video codec performs operations according to other proprietary or industry standards, including, for example, ITU-T H.261, ISO/IECMPEG-1Visual, ITU-T H.262 or ISO/IECMPEG-2Visual, ITU-T H.263, ISO/IECMPEG-4Visual, ITU-T H.264 (or ISO/IECMPEG-4 AVC), or the standards may also include hierarchical video codec and multiview video codec extensions. It should be understood that the techniques of this application are not limited to any particular codec standard or technique.
In general, a codec operation is in units of Coding Units (CUs). Specifically, in the encoding process, an image is divided into a plurality of CUs, and then pixel data in the CUs is encoded to obtain encoding information of each CU. In the decoding process, an image is divided into a plurality of CUs, and then each CU is reconstructed according to coding information corresponding to each CU, so that a reconstruction block of each CU is obtained. Alternatively, the image may be divided into a grid of coded treelike blocks. In some examples, the encoded treeblock is also referred to as a "treeblock", "maximum coding unit" (LCU), or "encoded treeblock". Optionally, the coding tree block may be further divided into a plurality of CUs.
Referring to fig. 1, fig. 1 schematically shows a block diagram of a video codec system 10 to which the present application applies. As shown in fig. 1, the system 10 includes a source device 12 and a destination device 14, wherein the source device 12 generates encoded video data, and thus the source device 12 is also referred to as a video encoding apparatus. The destination device 14 decodes the encoded video data generated by the source device 12, and thus the destination device 14 is also referred to as a video decoding apparatus.
Wherein the source device 12 and the destination device 14 include one or more processors therein and a memory coupled to the one or more processors. The memory includes, but is not limited to, random Access Memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory, or any other medium that can be used to store the desired program code in the form of instructions or data structures for access by a computer.
The source device 12 and the destination device 14 include various devices such as desktop computers, mobile computing devices, notebook (e.g., laptop) computers, tablet computers, set-top boxes, telephone handsets such as so-called "smart" phones, televisions, cameras, display devices, digital media players, video game consoles, vehicle-mounted computers, wireless communication devices, artificial intelligence devices, virtual reality/mixed reality/augmented reality devices, autopilot systems, or other devices, and embodiments of the application are not limited in terms of the structure and specific morphology of the above devices.
As shown in fig. 1, the source device 12 and the destination device 14 are connected by a link 13, and the destination device 14 receives encoded video data from the source device 12 via the link 13. Wherein the link 13 comprises one or more media or devices. In one possible implementation, link 13 includes one or more communication media that enable source device 12 to transmit encoded video data directly to destination device 14 in real-time. In an example, source device 12 modulates video data according to a communication standard (e.g., a wireless communication protocol) and transmits the modulated video data to destination device 14, the one or more communication media including a wireless or wired communication medium such as a Radio Frequency (RF) spectrum or at least one physical transmission line. The one or more communication media may form part of a packet-based network, which may be a local area network, a wide area network, or a global network (e.g., the internet), etc. The one or more communication media include routers, switches, base stations, or other devices facilitating communication from source device 12 to destination device 14.
Source device 12 includes an image source 16, an image preprocessor 18, an encoder 20, and a communication interface 22. In a specific implementation, the encoder 20, the image source 16, the image preprocessor 18, and the communication interface 22 may be hardware components in the source device 12 or may be software programs in the source device 12.
More specifically described below:
image source 16 may include any type of image capture device for capturing real world images or comments, which refers to some text on a screen for screen content encoding. Wherein the image capturing device is used to acquire and/or provide real world images, computer animated images, such as screen content, virtual Reality (VR) images, live (AR) images, etc. Image source 16 may be a camera for capturing images or a memory for storing images, and image source 16 may also include any type of (internal or external) interface for storing previously captured or generated images and/or for capturing or receiving images.
When image source 16 is a camera, image source 16 may be an integrated camera, either local or integrated in the source device; when image source 16 is a memory, image source 16 may be an integrated memory, either local or integrated in the source device. When the image source 16 includes an interface, the interface may be an external interface that receives images from an external video source, such as an external image capture device, e.g., a camera, external memory, or external image generation device, which is an external computer graphics processor, computer, or server. The interface may be any kind of interface according to any proprietary or standardized interface protocol, e.g. a wired or wireless interface, an optical interface.
The image stored in the image source 16 may be regarded as a two-dimensional array or matrix of pixels (pixels), which may also be referred to as sampling points, the number of sampling points of the array or image in the horizontal and vertical directions (or axes) defining the size and/or resolution of the image. To represent a color, three color components are typically employed, i.e., an image may be represented to contain three sample arrays. For example, in RBG format or color space, the image includes corresponding red (R), green (G), and blue (B) sample arrays. However, in video coding, each pixel is typically represented in a luminance/chrominance format or color space, e.g., for images in YUV format, comprising a luminance component indicated by Y (which may sometimes be indicated by L) and two chrominance components indicated by U and V. The luminance (luma) component Y represents the luminance or gray level intensity. For example, in a grayscale image, both are identical; and two chrominance (chroma) components U and V represent chrominance or color information components. The image in RGB format may be converted or transformed into YUV format and vice versa, a process also known as color transformation or conversion. If the image is black and white, the image may include only an array of luminance samples. In the present embodiment, the image transmitted by the image source 16 to the image pre-processor 18 may also be referred to as raw image data 17.
An image preprocessor 18 for receiving the original image data 17 and performing preprocessing on the original image data 17 to obtain a preprocessed image 19 or preprocessed image data 19. For example, the preprocessing performed by the image preprocessor 18 may include truing, color format conversion (e.g., from RGB format to YUV format), toning, or denoising.
The encoder 20, or video encoder 20, is configured to receive the preprocessed image data 19, and process the preprocessed image data 19 using a prediction mode to provide encoded image data 21 (or video bitstream). In some embodiments, encoder 20 may be configured to perform embodiments of the various video encoding methods described below to implement the image generation methods described herein (i.e., to obtain a matte image).
Communication interface 22 may be used to receive encoded image data 21 and transmit encoded image data 21 to destination device 14 over link 13, and communication interface 22 may be used to encapsulate encoded image data 21 into a suitable format, such as a data packet, for transmission over link 13.
Destination device 14 includes a communication interface 28, a decoder 30, an image post-processor 32, and a display device 34. The respective components or devices included in the destination device 14 are described one by one as follows:
A communication interface 28 for receiving encoded image data 21 from source device 12. In addition, the communication interface 28 is also configured to receive the encoded image data 21 via a link 13 between the source device 12 and the destination device 14, the link 13 being a direct wired or wireless connection, any type of network, such as a wired or wireless network or any combination thereof, or any type of private and public networks, or any combination thereof. Communication interface 28 may also be used to de-encapsulate data packets transmitted by communication interface 22 to obtain encoded image data 21.
It should be noted that both communication interface 28 and communication interface 22 may be unidirectional communication interfaces or bi-directional communication interfaces, and may be used to send and receive messages, and/or to establish a communication link over which image data, such as encoded image data transmissions, may be transmitted.
Decoder 30 (or video decoder 30) is provided for receiving encoded image data 21 and providing decoded image data 31 or decoded image 31. In some embodiments, decoder 30 may be configured to perform embodiments of the various video decoding methods described below to implement the image generation methods described herein.
An image post-processor 32 for performing post-processing on the decoded image data 31 to obtain post-processed image data 33. The post-processing performed by the image post-processor 32 may include: color format conversion (e.g., from YUV format to RGB format), toning, truing, or resampling, or any other process, may also be used to transmit post-processed image data 33 to display device 34.
A display device 34 for receiving the post-processed image data 33 for displaying the image to a user or viewer. Display device 34 includes any type of display for presenting a reconstructed image, such as an integrated or external display or monitor. Further, the display may include a liquid crystal display (liquidcrystal display, LCD), an Organic Light Emitting Diode (OLED) display, a plasma display, a projector, a micro LED display, a liquid crystal on silicon (LCoS), a Digital Light Processor (DLP), or any other type of display.
It should be understood that the source device 12 and the destination device 14 shown in fig. 1 may be separate devices or may be integrated in the same device, i.e. the integrated device comprises the functionality of both the source device 12 and the destination device 14. In one possible implementation, the source device 12 or corresponding functionality and the destination device 14 or corresponding functionality may be implemented using the same hardware and/or software, or using separate hardware and/or software, or any combination thereof.
Furthermore, based on the above description, it is known that the existence and (exact) division of the functionality of the different units or the functionality of the source device 12 and/or the destination device 14 shown in fig. 1 may vary depending on the actual device and application. The source device 12 and the destination device 14 may comprise any of a variety of devices, including any type of handheld or stationary device, such as a notebook or laptop computer, mobile phone, smart phone, tablet or tablet computer, video camera, desktop computer, set-top box, television, camera, in-vehicle device, display device, digital media player, video game console, video streaming device (e.g., content server or content distribution server), broadcast receiver device, broadcast transmitter device, etc., and embodiments of the present application are not limited in the specific structure and implementation of the source device 12 and the destination device 14.
Encoder 20 and decoder 30 may each be any of a variety of suitable circuits, such as, for example, one or more microprocessors, digital Signal Processors (DSPs), application Specific Integrated Circuits (ASICs), field-programmable gate arrays (FPGAs), discrete logic, hardware, or any combination thereof. If the techniques are implemented in part in software, the apparatus may store instructions of the software in a suitable computer-readable storage medium and may execute the computer program instructions using one or more processors to perform the image generation methods described herein.
In an example, taking the video encoding and decoding system 10 shown in fig. 1 as an example only, the technical solution of the embodiments of the present application may be applied to a video encoding arrangement that does not necessarily involve any data communication between encoding and decoding devices, such as video encoding or video decoding. In other examples, the data may be retrieved from local memory, streamed over a network, and so forth. The video encoding device may encode the data and store the data in the memory, and/or the video decoding device may retrieve the data from the memory and decode the data.
Referring to fig. 2, a flow chart of a video encoding and decoding method according to an embodiment of the present application may be applied to the system shown in fig. 1. Specifically, the method can be summarized as the following five steps, respectively: input video 110, video encoding 120, video streaming 130, video decoding 140, and output video 150.
Wherein, step "input video 110" inputs lossless video or image captured by a capture device, such as a camera, to an encoder; in the step of video coding 120, the obtained video or image is compressed and coded by an H.264 or H.265 codec to generate a coded video code stream; the video stream is then uploaded to the cloud server in step "video stream transmission 130", and the user downloads the video stream from the cloud server. Step "video decoding 140" includes a process in which the terminal device decodes the video code stream downloaded from the cloud end by the decoder, and finally, step "output video 150" outputs and displays the decoded video image.
The following describes the technical scheme of the embodiment of the present application in detail.
Fig. 3a shows a schematic flow chart of an image processing method according to one embodiment of the present application, which may include, but is not limited to, the following steps:
Step 301, a first video code stream is obtained, wherein the video code stream comprises a group of N frames of image matting images with adjacent time sequences; the N frames of the image matting images comprise current frames of the image matting images; wherein N is a positive integer greater than 0;
the video code stream is a code stream or a bit stream which is output after the input video is encoded and compressed by an encoder, and the video code stream comprises two frames or more than two frames of images. The time-sequential adjacency refers to frames that are continuously photographed (or generated) in time. Specifically, a matting video can be acquired first; then, performing video decoding on the matted video to obtain a video picture frame of the matted video; and finally, acquiring the first video code stream in the video picture frame according to a set time interval or according to the picture content variation of the video picture frame.
In an example, the N frame matting images in the first image group include a first frame matting image, a second frame matting image, and a third frame matting image. For example, the first frame matting image is the current frame matting image.
Step S302, fusing the N frames of adjacent time sequence image matting images to obtain fused image matting images;
the process of fusing a group of N frames of time sequence adjacent image matting images to obtain fused image matting images can comprise the following steps: sequentially acquiring image features corresponding to each frame of the matting images in the N frames of matting images; and fusing the image features corresponding to each frame of the image matting images to obtain fusion features of the N frames of the image matting images, so as to obtain the fused image matting images based on the fusion features. It can be understood that continuous changes among the multi-frame images and the fusion degree among the multi-frame images are fully considered in the fused matting images.
Step S303, replacing the fused image matting image with a next frame image matting image adjacent to the current frame image matting image to form a next group of N frame image matting images adjacent in time sequence, wherein the current frame image matting image in the N frame image matting images adjacent in time sequence of the next group is updated into the fused image matting image;
in an example, the number of groups of the image matting is 2, each image group contains 3 frames of image matting, as shown in fig. 3b, 5 frames of image matting are output through encoding of a video encoder, and are respectively m1, m2, m3, m4 and m5, wherein m1, m2 and m3 are the current frame of image matting in the 1 st image matting group, the image features corresponding to m1, m2 and m3 are sequentially acquired, the image features corresponding to each image matting are fused to obtain fusion features, the fused image matting v1 of the 1 st image matting group is obtained based on the fusion features (as shown in fig. 3 c), then the fused image matting v1 is substituted for the next frame of image matting adjacent to the current frame of image matting in the 1 st image matting group (namely, m 2), the specific implementation process of the method can be shown in fig. 3d, and a next image matting image group v1, m3 and m4 (shown in fig. 3 e) is formed, in the next image matting image group, the fused image matting image is a first frame image matting image (the implementation process can be understood as updating the current frame image matting image in the 2 nd image matting image group into the fused image matting image), then, image features corresponding to v1, m3 and m4 are sequentially acquired, the image features corresponding to each image matting image are fused to obtain fusion features, and a fused image matting image v2 (shown in fig. 3 e) of the 2 nd image group is obtained based on the fusion features, and then, the method is sequentially analogized, so that the fused image matting image of each group of the image matting images can be obtained.
Step S304, obtaining a second video code stream based on the respective fused matting images of each group of matting images.
In an example, the first matting image m1, the matting image after each group of matting images are respectively fused, and the last matting image m4 in the obtained first video code stream may be spliced to obtain a second video code stream (as shown in fig. 3 f).
In practical application, the processor may store the obtained matting images in the memory after obtaining the first matting image, further determine whether the number of the obtained matting images is greater than a preset number (for example, the preset number is 3), and if the number of the obtained matting images is greater than 3, perform fusion and substitution processing on the obtained matting images, for example, the matting images after fusion of the matting image group may be obtained, and replace the matting images after fusion with the next matting image adjacent to the current matting image in the current matting image group, wherein the current matting image in the next matting image group is updated to be the matting image after fusion, then further determine whether the subsequently obtained matting images are greater than 3, and if the number of the obtained matting images is greater than 3, perform fusion processing on the obtained matting images, for example, the first matting image and the next matting image after fusion with the current matting image of the first matting image group, and then each next matting image of the first matting image group may be obtained, and the next matting image of the first matting image group may be fused with the next matting image after fusion of the first matting image and the next matting image of the first matting image group may be obtained.
It can be understood that in the image processing method provided by the application, in a group of N frames of images adjacent in time sequence, the N frames of the image matting images are fused to obtain fused image matting images, and the fused image matting images replace the next frame of the image matting image adjacent to the current frame of the image matting images in the N frames of the image matting images so as to form a new group of N frames of image matting images adjacent in time sequence, and the new group of N frames of image matting images adjacent in time sequence are fused again to obtain the fused image matting images of the group of image matting images, so that the processed video code stream can be obtained based on the respective fused image matting images of each group of image matting images. Because the continuous change among the multi-frame images and the fusion degree among the multi-frame images are fully considered by the fused image matting images, the stability of the image matting video can be improved in this way, and the flicker phenomenon is avoided to the greatest extent. For the user, a better visual effect can be improved for the user.
In another embodiment of the present application, as shown in fig. 3g, after the second video bitstream is obtained based on the image processing method shown in fig. 3a, the second video bitstream may be further processed, which may include, but is not limited to, the following steps:
And step S305, processing the second video code stream by using a Gaussian smoothing algorithm to obtain a third video code stream.
For example, the obtained second video code stream is smoothed, and the characteristics of the front and back adjacent matting images are considered in the Gaussian smoothing algorithm, so that the stability of the matting video can be further improved.
It should be noted that, for simplicity of description, the foregoing method embodiments are all depicted as a series of acts, but it should be understood by those skilled in the art that the present disclosure is not limited by the order of acts described, as some steps may occur in other orders or concurrently in accordance with the disclosure. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all alternative embodiments, and that the acts and modules referred to are not necessarily required by the present disclosure.
It should be further noted that, although the steps in the flowcharts of fig. 3a and 3g are sequentially shown as indicated by arrows, these steps are not necessarily sequentially executed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in fig. 3a, 3g may comprise sub-steps or phases, which are not necessarily performed at the same time, but may be performed at different times, nor does the order of execution of the sub-steps or phases necessarily follow one another, but may be performed alternately or alternately with other steps or at least part of the sub-steps or phases of other steps.
The image processing method according to the embodiment of the present application is described in detail above with reference to fig. 1 to 3g, and in order to facilitate better implementation of the foregoing solution according to the embodiment of the present application, correspondingly, related devices and apparatuses for implementing the foregoing solution in a matching manner are further provided below.
Referring to fig. 4, a schematic structural diagram of an image processing apparatus 40 according to an embodiment of the present application may include:
a first obtaining unit 400, configured to obtain a first video code stream, where the video code stream includes a set of N frame matting images adjacent in time sequence; the N frames of the image matting images comprise current frames of the image matting images; wherein N is a positive integer greater than 0;
an image fusion unit 402, configured to fuse the set of N frames of matted images adjacent in time sequence, so as to obtain a fused matted image;
an image processing unit 404, configured to replace a next frame of the matting image adjacent to the current frame of the matting image with the fused matting image to form a next set of N frames of the matting images adjacent to each other at a time sequence, where the current frame of the matting image in the next set of N frames of the matting images adjacent to each other at the time sequence is updated to the fused matting image;
the second obtaining unit 406 is configured to obtain a second video code stream based on the respective fused matting images of each group of matting images.
In one possible implementation manner, the first obtaining unit 400 is specifically configured to:
acquiring a keying video;
performing video decoding on the matted video to obtain a video picture frame of the matted video;
and acquiring the first video code stream in the video picture frame according to a set time interval or according to the picture content variation of the video picture frame.
In one possible implementation, the fused image unit 402 is specifically configured to:
sequentially acquiring image features corresponding to each frame of the matting images in the N frames of matting images;
and fusing the image features corresponding to each frame of the image matting images to obtain fusion features of the N frames of the image matting images, so as to obtain the fused image matting images based on the fusion features.
In one possible implementation, the apparatus further includes:
and a third obtaining unit 408, configured to process the second video code stream by using a gaussian smoothing algorithm to obtain a third video code stream.
In one possible implementation, the N is 3, and the number of groups of the matted images is 2.
It should be noted that, each apparatus in the above system may further include other units, and specific implementations of each device and unit may refer to related descriptions in the above method embodiments, which are not repeated herein.
In order to better implement the foregoing aspects of the embodiments of the present application, the present application further correspondingly provides an electronic device 50, which is described in detail below with reference to the accompanying drawings:
as shown in fig. 5, in the schematic structural diagram of the electronic device provided in the embodiment of the present application, the electronic device 500 may include a processor 501, a memory 504, and a communication module 505, where the processor 501, the memory 504, and the communication module 505 may be connected to each other through a bus 506. The memory 504 may be a high-speed random access memory (RandomAccessMemory, RAM) memory or a non-volatile memory (non-volatile memory), such as at least one disk memory. The memory 504 may also optionally be at least one storage system located remotely from the aforementioned processor 501. Memory 504 is used for storing application program codes and may include an operating system, a network communication module, a user interface module, and a data processing program, and communication module 505 is used for information interaction with external devices; the processor 501 is configured to invoke the program code to perform the steps of:
acquiring a first video code stream, wherein the video code stream comprises a group of N frames of matted images with adjacent time sequences; the N frames of the image matting images comprise current frames of the image matting images; wherein N is a positive integer greater than 0;
Fusing the N frames of adjacent time sequences to obtain fused keying images;
replacing the next frame of the matting image adjacent to the current frame of the matting image with the fused matting image to form a next group of N frames of adjacent time sequence matting images, wherein the current frame of the matting image in the next group of N frames of adjacent time sequence matting images is updated to the fused matting image;
and obtaining a second video code stream based on the respective fused keying images of each group of keying images.
Wherein the processor 501 obtains a first video code stream, including:
acquiring a keying video;
performing video decoding on the matted video to obtain a video picture frame of the matted video;
and acquiring the first video code stream in the video picture frame according to a set time interval or according to the picture content variation of the video picture frame.
The fusing, by the processor 501, the N frames of the matted images adjacent to the set of time sequences to obtain a fused matted image, including:
sequentially acquiring image features corresponding to each frame of the matting images in the N frames of matting images;
and fusing the image features corresponding to each frame of the image matting images to obtain fusion features of the N frames of the image matting images, so as to obtain the fused image matting images based on the fusion features.
Wherein the processor 501 may be further configured to:
and processing the second video code stream by using a Gaussian smoothing algorithm to obtain a third video code stream.
Wherein, N is 3, the group number of the image matting image is 2.
Embodiments also provide a computer storage medium having instructions stored therein, which when run on a computer or processor, cause the computer or processor to perform one or more steps of the method of any of the embodiments described above. The respective constituent modules of the above apparatus, if implemented in the form of software functional units and sold or used as separate products, may be stored in the computer-readable storage medium, and based on such understanding, the technical solution of the present application may be embodied essentially or partly or wholly or partly in the form of a software product, which is stored in the computer-readable storage medium.
The computer readable storage medium may be an internal storage unit of the apparatus according to the foregoing embodiment, such as a hard disk or a memory. The computer readable storage medium may also be an external storage device of the device, such as a plug-in hard disk, a smart memory card (SmartMediaCard, SMC), a secure digital (SecureDigital, SD) card, a flash memory card (FlashCard), or the like. Further, the computer-readable storage medium may include both an internal storage unit and an external storage device of the above device. The computer-readable storage medium is used to store the computer program and other programs and data required by the apparatus. The above-described computer-readable storage medium may also be used to temporarily store data that has been output or is to be output.
Those skilled in the art will appreciate that implementing all or part of the above-described embodiment methods may be accomplished by way of a computer program, which may be stored in a computer-readable storage medium and which, when executed, may comprise the steps of the embodiments of the methods described above. And the aforementioned storage medium includes: various media capable of storing program code, such as ROM, RAM, magnetic or optical disks.
The steps in the method of the embodiment of the application can be sequentially adjusted, combined and deleted according to actual needs.
The modules in the device of the embodiment of the application can be combined, divided and deleted according to actual needs.
It will be appreciated by those of ordinary skill in the art that the various exemplary elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
Those of skill in the art will appreciate that the functions described in connection with the various illustrative logical blocks, modules, and algorithm steps described in connection with the embodiments disclosed herein may be implemented as hardware, software, firmware, or any combination thereof. If implemented in software, the functions described by the various illustrative logical blocks, modules, and steps may be stored on a computer readable medium or transmitted as one or more instructions or code and executed by a hardware-based processing unit. Computer-readable media may include computer-readable storage media corresponding to tangible media, such as data storage media, or communication media including any medium that facilitates transfer of a computer program from one place to another (e.g., according to a communication protocol). In this manner, a computer-readable medium may generally correspond to (1) a non-transitory tangible computer-readable storage medium, or (2) a communication medium, such as a signal or carrier wave. Data storage media may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementing the techniques described herein. The computer program product may include a computer-readable medium.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.
In the several embodiments provided in this application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-only memory (ROM), a Random access memory (Random AccessMemory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The foregoing is merely specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the present application, and the changes and substitutions are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (10)

1. An image processing method, comprising:
acquiring a first video code stream, wherein the video code stream comprises a group of N frames of matted images with adjacent time sequences; the N frames of the image matting images comprise current frames of the image matting images; wherein N is a positive integer greater than 0;
fusing the N frames of adjacent time sequences to obtain fused keying images;
replacing the next frame of the matting image adjacent to the current frame of the matting image with the fused matting image to form a next group of N frames of adjacent time sequence matting images, wherein the current frame of the matting image in the next group of N frames of adjacent time sequence matting images is updated to the fused matting image;
and obtaining a second video code stream based on the respective fused keying images of each group of keying images.
2. The method of claim 1, wherein the acquiring the first video bitstream comprises:
acquiring a keying video;
performing video decoding on the matted video to obtain a video picture frame of the matted video;
and acquiring the first video code stream in the video picture frame according to a set time interval or according to the picture content variation of the video picture frame.
3. A method as in claim 1 wherein said fusing said set of time-sequentially adjacent N-frame matting images to obtain a fused matting image comprises:
sequentially acquiring image features corresponding to each frame of the matting images in the N frames of matting images;
and fusing the image features corresponding to each frame of the image matting images to obtain fusion features of the N frames of the image matting images, so as to obtain the fused image matting images based on the fusion features.
4. The method of claim 1, wherein the method further comprises:
and processing the second video code stream by using a Gaussian smoothing algorithm to obtain a third video code stream.
5. A method as in any of claims 1-4 wherein N is 3 and the number of groups of matting images is 2.
6. An image processing apparatus, comprising:
the first acquisition unit is used for acquiring a first video code stream, wherein the video code stream comprises a group of N frames of matted images with adjacent time sequences; the N frames of the image matting images comprise current frames of the image matting images; wherein N is a positive integer greater than 0;
the image fusion unit is used for fusing the N frames of adjacent time sequence image matting images to obtain fused image matting images;
The image processing unit is used for replacing the next frame of the image matting image adjacent to the current frame of the image matting image to form a next group of N frames of adjacent time sequence image matting images, wherein the current frame of the image matting image in the next group of N frames of adjacent time sequence image matting images is updated to the fused image matting image;
the second acquisition unit is used for acquiring a second video code stream based on the respective fused matting images of each group of matting images.
7. The apparatus of claim 6, wherein the first acquisition unit is specifically configured to:
acquiring a keying video;
performing video decoding on the matted video to obtain a video picture frame of the matted video;
and acquiring the first video code stream in the video picture frame according to a set time interval or according to the picture content variation of the video picture frame.
8. The apparatus of claim 6, wherein the fused image unit is specifically configured to:
sequentially acquiring image features corresponding to each frame of the matting images in the N frames of matting images;
and fusing the image features corresponding to each frame of the image matting images to obtain fusion features of the N frames of the image matting images, so as to obtain the fused image matting images based on the fusion features.
9. An electronic device, comprising: a memory for storing and supporting a program for a processor to perform the method of any one of claims 1 to 5, and a processor configured to execute the program stored in the memory.
10. A computer readable medium having non-volatile program code executable by a processor, the program code causing the processor to perform the method of any one of claims 1 to 5.
CN202310311877.1A 2023-03-21 2023-03-21 Image processing method, related device and computer readable storage medium Pending CN116437123A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310311877.1A CN116437123A (en) 2023-03-21 2023-03-21 Image processing method, related device and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310311877.1A CN116437123A (en) 2023-03-21 2023-03-21 Image processing method, related device and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN116437123A true CN116437123A (en) 2023-07-14

Family

ID=87090041

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310311877.1A Pending CN116437123A (en) 2023-03-21 2023-03-21 Image processing method, related device and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN116437123A (en)

Similar Documents

Publication Publication Date Title
EP4020370A1 (en) Image processing method and device
JP2020526994A (en) Chroma prediction method and device
JP7205038B2 (en) Encoders, Decoders and Corresponding Methods Using IBC Search Scope Optimization for Arbitrary CTU Sizes
WO2020048463A1 (en) Method and apparatus for intra prediction
CN110881126B (en) Chroma block prediction method and device
CN112995663B (en) Video coding method, video decoding method and corresponding devices
AU2019386917B2 (en) Encoder, decoder and corresponding methods of most probable mode list construction for blocks with multi-hypothesis prediction
US20220368751A1 (en) Point cloud data transmission device, point cloud data transmission method, point cloud data reception device, and point cloud data reception method
JP2022539683A (en) Image processing method and apparatus
CN114339238A (en) Video coding method, video decoding method and device thereof
CN114026864A (en) Chroma sample weight derivation for geometric partitioning modes
JP2023126795A (en) Method and apparatus for chroma intra prediction in video coding
US9883192B2 (en) Color space compression
CN111246208B (en) Video processing method and device and electronic equipment
CN111406404B (en) Compression method, decompression method, system and storage medium for obtaining video file
WO2020042853A1 (en) Method and apparatus for intra prediction
CN107197295B (en) A kind of coded system and method
CN116437123A (en) Image processing method, related device and computer readable storage medium
US11985303B2 (en) Context modeling method and apparatus for flag
ES2946058T3 (en) An encoder, decoder, and corresponding methods that use intramode coding for intraprediction
US20220239946A1 (en) Point cloud data transmission device, point cloud data transmission method, point cloud data reception device, and point cloud data reception method
CN112929703A (en) Method and device for processing code stream data
CN110012307A (en) Video transmission method, device and main terminal equipment
CN113615191B (en) Method and device for determining image display sequence and video encoding and decoding equipment
CN116962696A (en) Image coding method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination