CN116980589A - Image encoding/decoding method, device, computer, storage medium, and program product - Google Patents

Image encoding/decoding method, device, computer, storage medium, and program product Download PDF

Info

Publication number
CN116980589A
CN116980589A CN202310582517.5A CN202310582517A CN116980589A CN 116980589 A CN116980589 A CN 116980589A CN 202310582517 A CN202310582517 A CN 202310582517A CN 116980589 A CN116980589 A CN 116980589A
Authority
CN
China
Prior art keywords
image
encoded
frame type
frame
key
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310582517.5A
Other languages
Chinese (zh)
Inventor
王茹
雷海波
江林燕
宋秉一
罗青
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202310582517.5A priority Critical patent/CN116980589A/en
Publication of CN116980589A publication Critical patent/CN116980589A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/105Selection of the reference unit for prediction within a chosen coding or prediction mode, e.g. adaptive choice of position and number of pixels used for prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/30Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The embodiment of the application discloses an image coding and decoding method, an image coding and decoding device, a computer, a storage medium and a program product, and relates to the field of artificial intelligence, the field of cloud technology and the like, wherein the method comprises the following steps: detecting an image frame type of an image to be encoded; if the image frame type is the key frame type, identifying image key points of the image to be encoded, encoding the image to be encoded, generating encoded data corresponding to the image to be encoded, and packaging the image key points, the image frame type of the image to be encoded and the encoded data into an encoded code stream of the image to be encoded; the key frame type is used for representing that the image to be decoded is the first image frame of the video stream to be encoded, or the image frame type which cannot be generated based on the image key points; if the image frame type is the conventional frame type, identifying the image key points of the image to be encoded, and packaging the image key points and the image frame type into an encoding code stream of the image to be encoded. By adopting the application, the accuracy of image coding and decoding can be improved.

Description

Image encoding/decoding method, device, computer, storage medium, and program product
Technical Field
The present application relates to the field of computer technologies, and in particular, to an image encoding and decoding method, an image encoding and decoding device, a computer, a storage medium, and a program product.
Background
In data processing, encoding and decoding of data have become an important part, at present, three-dimensional key point extraction is generally used to realize generation of a face under different angles, so that feature information and motion information of the face are decoupled, and the face can rotate freely, so that encoding and decoding processing of the face can be realized based on the extracted three-dimensional key points when video encoding and decoding are performed, and the method leads to high computational complexity due to the fact that the image encoding and decoding complexity is high, resources are consumed more, and further image encoding and decoding efficiency is low. Or, the residual error between the current frame and the adjacent frame is obtained, and the residual error is encoded, so that the residual error is required to be continuously calculated and encoded, and the image encoding efficiency is lower.
Disclosure of Invention
The embodiment of the application provides an image coding and decoding method, an image coding and decoding device, a computer, a storage medium and a program product, which can improve the accuracy and efficiency of image coding and decoding.
In one aspect, an embodiment of the present application provides an image encoding method, including:
detecting an image frame type of an image to be encoded;
if the image frame type is the key frame type, identifying image key points of the image to be encoded, encoding the image to be encoded, generating encoded data corresponding to the image to be encoded, and packaging the image key points, the image frame type of the image to be encoded and the encoded data into an encoded code stream of the image to be encoded; the key frame type is used for representing that the image to be decoded is the first image frame of the video stream to be encoded, or the image frame type which cannot be generated based on the image key points;
if the image frame type is the conventional frame type, identifying the image key points of the image to be encoded, and packaging the image key points and the image frame type into an encoding code stream of the image to be encoded.
In one aspect, an embodiment of the present application provides an image decoding method, including:
acquiring an image frame type from an encoding code stream corresponding to an image to be encoded;
if the image frame type is the key frame type, acquiring encoded data in the encoded code stream, and decoding the encoded data to obtain a decoded image corresponding to the image to be encoded; the coding code stream is obtained by encapsulating the image key points, the image frame types and the coding data of the image to be coded; the key frame type is used for representing that the image to be decoded is the first image frame of the video stream to be encoded, or the image frame type which cannot be generated based on the image key points;
If the image frame type is the conventional frame type, acquiring a reference frame in the video stream to be encoded, acquiring an image key point in the encoded code stream, and generating a decoded image based on the reference frame and the image key point; the coding code stream is obtained by packaging based on the image key points and the image frame types of the images to be coded.
In one aspect, an embodiment of the present application provides an image encoding apparatus, including:
the type detection module is used for detecting the type of the image frame of the image to be encoded;
the key point identification module is used for identifying image key points of the image to be encoded if the image frame type is the key frame type;
the image coding module is used for coding the image to be coded and generating coded data corresponding to the image to be coded;
the code stream packaging module is used for packaging the image key points, the image frame types of the images to be encoded and the encoded data into an encoded code stream of the images to be encoded; the key frame type is used for representing that the image to be decoded is the first image frame of the video stream to be encoded, or the image frame type which cannot be generated based on the image key points;
the code stream packaging module is also used for identifying image key points of the image to be encoded if the image frame type is a conventional frame type, and packaging the image key points and the image frame type into the code stream of the image to be encoded.
The video stream to be encoded comprises N image frames, and the key frame type comprises a first frame type and a predicted frame type; n is a positive integer;
the type detection module comprises:
the first frame determining unit is used for determining that the image frame type of the image to be encoded is the first frame type if the image to be encoded is the first image frame in the video stream to be encoded;
the reference acquisition unit is used for acquiring reference frames in N image frames if the image to be encoded is not the first image frame in the video stream to be encoded;
the key point detection unit is used for acquiring a reference key point in a reference frame and an image key point in an image to be encoded;
an image reconstruction unit for generating a reconstructed image based on the image key points and the reference key points; the reference frame is the first image frame or one of N image frames, and the image frame type is the image frame positioned before the image to be coded in the predicted frame type;
the prediction determining unit is used for determining that the image frame type of the image to be encoded is a prediction frame type if the image difference degree between the reconstructed image and the image to be encoded is larger than or equal to the image abnormality threshold value;
and the conventional determining unit is used for determining that the image frame type of the image to be encoded is a conventional frame type if the image difference degree between the reconstructed image and the image to be encoded is smaller than the image anomaly threshold value.
Wherein the image reconstruction unit comprises:
an optical flow determining subunit, configured to determine optical flow information of the image to be encoded relative to the reference frame based on the image key point and the reference key point;
and the image reconstruction subunit is used for updating the reference frame based on the optical flow information to obtain a reconstructed image corresponding to the image to be encoded.
Wherein, the image reconstruction subunit is specifically configured to:
carrying out convolution processing on the reference frame to obtain image convolution characteristics, and carrying out characteristic conversion on the image convolution characteristics by adopting optical flow information to obtain first image characteristics;
residual processing is carried out on the first image feature to obtain a second image feature;
and carrying out convolution processing on the second image characteristics to generate a reconstructed image corresponding to the image to be encoded.
The key point detection unit is specifically configured to:
respectively inputting the reference frame and the image to be encoded into a key point analysis network to analyze key points, and determining reference key points in the reference frame and image key points in the image to be encoded;
the image reconstruction unit includes:
the optical flow analysis subunit is used for inputting the reference key points and the image key points into an optical flow analysis model to perform optical flow analysis so as to obtain optical flow information of the image to be coded relative to the reference frame;
And the network generation subunit is used for inputting the reference frame and the optical flow information into the image generation network to generate a reconstructed image.
Wherein the apparatus further comprises:
the sample analysis module is used for respectively inputting the first image sample and the second image sample into the initial key point analysis network to perform key point analysis, and determining a first sample key point in the first image sample and a second sample key point in the second image sample;
the sample analysis model is also used for inputting the first sample key points and the second sample key points into the initial optical flow analysis model for optical flow analysis to obtain sample optical flow information of the second image sample relative to the first image sample;
the sample reconstruction module is used for inputting the first image sample and sample optical flow information into the initial image generation network to generate a first sample reconstruction image;
the loss construction module is used for reconstructing an image according to the first sample and constructing model loss according to the second image sample;
the model training module is used for carrying out parameter adjustment on the initial key point analysis network, the initial optical flow analysis model and the initial image generation network by adopting model loss until the parameters are converged, so as to obtain the key point analysis network corresponding to the initial key point analysis network, the optical flow analysis model corresponding to the initial optical flow analysis model and the image generation network corresponding to the initial image generation network.
Wherein model loss includes generation loss; the loss building module includes:
the quality determining unit is used for inputting the first sample reconstructed image into the image discriminator for image detection and determining the output image quality corresponding to the first sample reconstructed image;
a loss generation unit configured to generate the generation loss based on the output image quality;
the apparatus further comprises:
the image discrimination module is used for acquiring a second sample reconstruction image corresponding to the third image sample, respectively inputting the second sample reconstruction image and the third image sample into the initial image discriminator for image detection, and determining the first sample quality corresponding to the third image sample and the second sample quality corresponding to the second sample reconstruction image;
and the judging training module is used for generating image judging loss according to the first sample quality and the second sample quality, and carrying out parameter adjustment on the initial image judging device by adopting the image judging loss until the image judging device is obtained.
Wherein the apparatus further comprises:
the structure analysis module is used for acquiring a first key region in the image to be encoded and a second key region in the reconstructed image, and carrying out structural similarity analysis on the first key region and the second key region to obtain the image difference degree between the reconstructed image and the image to be encoded;
The conventional determination unit is specifically configured to:
if the image difference degree between the reconstructed image and the image to be encoded is smaller than the image abnormality threshold, acquiring continuous M image frames comprising the image to be encoded from N image frames; m is a positive integer;
if the image expressions of the M image frames are smaller than the expression threshold value, determining that the image frame type of the image to be encoded is a conventional frame type.
Wherein the apparatus further comprises:
the object detection module is used for carrying out object detection on the image to be encoded, and if a target object is detected in the image to be encoded, the process of acquiring the reference key point in the reference frame and the image key point in the image to be encoded is executed;
and the prediction determining module is used for determining that the image frame type of the image to be encoded is a predicted frame type if the target object is not detected in the image to be encoded.
Wherein, this image coding module includes:
the residual error coding unit is used for obtaining adjacent images of the image to be coded in the video stream to be coded, obtaining a first image residual error between the image to be coded and the adjacent images, coding the first image residual error and generating coding data corresponding to the image to be coded; or alternatively, the process may be performed,
the residual error coding unit is also used for obtaining a reference frame in the video stream to be coded, obtaining a second image residual error between the image to be coded and the reference frame, coding the second image residual error and generating coding data corresponding to the image to be coded; or alternatively, the process may be performed,
The image coding unit is used for coding the image to be coded and generating coding data corresponding to the image to be coded.
An aspect of an embodiment of the present application provides an image decoding apparatus, including:
the type acquisition module is used for acquiring an image frame type from an encoding code stream corresponding to an image to be encoded;
the data acquisition module is used for acquiring the coded data in the coded code stream if the image frame type is a key frame type;
the data decoding module is used for decoding the coded data to obtain a decoded image corresponding to the image to be coded; the coding code stream is obtained by encapsulating the image key points, the image frame types and the coding data of the image to be coded; the key frame type is used for representing that the image to be decoded is the first image frame of the video stream to be encoded, or the image frame type which cannot be generated based on the image key points;
the reference acquisition module is used for acquiring a reference frame in the video stream to be encoded if the image frame type is a conventional frame type;
the key point acquisition module is used for acquiring image key points in the coded code stream;
the reference decoding module is used for generating a decoded image based on the reference frame and the image key points; the coding code stream is obtained by packaging based on the image key points and the image frame types of the images to be coded.
Wherein, this data decoding module includes:
the residual decoding unit is used for decoding the encoded data to obtain a first decoding residual, acquiring adjacent decoding images adjacent to the encoded code stream, fusing the first decoding residual on the basis of the adjacent decoding images, and generating a decoding image corresponding to the image to be encoded; or alternatively, the process may be performed,
the residual error decoding unit is also used for decoding the coded data to obtain a second decoding residual error, acquiring a reference frame, fusing the second decoding residual error on the basis of the reference frame, and generating a decoding image corresponding to the image to be coded; or alternatively, the process may be performed,
and the data decoding unit is used for decoding the encoded data to obtain a decoded image corresponding to the image to be encoded.
Wherein the reference decoding module comprises:
the optical flow determining unit is used for acquiring reference key points of the reference frame and determining optical flow information of the image to be coded relative to the reference frame according to the reference key points and the image key points;
and the optical flow decoding unit is used for updating the reference frame based on the optical flow information to obtain a decoded image.
In one aspect, the embodiment of the application provides a computer device, which comprises a processor, a memory and an input/output interface;
The processor is respectively connected with the memory and the input/output interface, wherein the input/output interface is used for receiving data and outputting data, the memory is used for storing a computer program, and the processor is used for calling the computer program so as to enable the computer equipment containing the processor to execute the image coding and decoding method in one aspect of the embodiment of the application.
An aspect of an embodiment of the present application provides a computer-readable storage medium storing a computer program adapted to be loaded and executed by a processor to cause a computer device having the processor to perform the image encoding and decoding method in the aspect of the embodiment of the present application.
In one aspect, embodiments of the present application provide a computer program product or computer program comprising computer instructions stored in a computer-readable storage medium. The computer instructions are read from the computer-readable storage medium by a processor of a computer device, and executed by the processor, cause the computer device to perform the methods provided in the various alternatives in an aspect of the embodiments of the application. In other words, the computer instructions, when executed by a processor, implement the methods provided in the various alternatives in one aspect of the embodiments of the present application.
The implementation of the embodiment of the application has the following beneficial effects:
in the embodiment of the application, the image frame type of the image to be coded is detected; if the image frame type is the key frame type, identifying image key points of the image to be encoded, encoding the image to be encoded, generating encoded data corresponding to the image to be encoded, and packaging the image key points, the image frame type of the image to be encoded and the encoded data into an encoded code stream of the image to be encoded; the key frame type is used for representing that the image to be decoded is the first image frame of the video stream to be encoded, or the image frame type which cannot be generated based on the image key points; if the image frame type is the conventional frame type, identifying the image key points of the image to be encoded, and packaging the image key points and the image frame type into an encoding code stream of the image to be encoded. Through the above process, the coding and decoding processing of the image to be coded can be realized based on the image frame type of the image to be coded, wherein in general cases (namely the conventional frame type), only key points of the image are required to be coded, the code rate saving of orders of magnitude can be realized, and less bandwidth occupation is required under the same video quality; on the basis, when the image frame type of the image to be encoded is the key frame type, the image to be encoded can not be directly reconstructed based on the image key points, so that the image to be encoded can be encoded to make up for possible encoding abnormality in the manner of encoding based on the key points, and the number of the images of the key frame type is generally not too large, so that the data amount required to be processed for image encoding is not increased too much, that is, the improvement of the efficiency of image encoding can still be realized, and the accuracy of image encoding can be further improved.
Drawings
In order to more clearly illustrate the embodiments of the application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a network interaction architecture diagram of image encoding and decoding provided by an embodiment of the present application;
FIG. 2 is a schematic diagram of an image encoding scene according to an embodiment of the present application;
FIG. 3 is a flowchart of a method for encoding an image according to an embodiment of the present application;
FIG. 4 is a schematic view of an image reconstruction scene according to an embodiment of the present application;
FIG. 5 is a schematic diagram of a model reconstruction scene provided by an embodiment of the present application;
FIG. 6 is a schematic diagram of an alternative image frame type determination scenario provided by an embodiment of the present application;
fig. 7 is a schematic diagram of a video stream encoding method according to an embodiment of the present application;
FIG. 8 is a schematic diagram of an image encoding scene flow provided in an embodiment of the present application;
fig. 9 is a schematic diagram of a video stream encoding process according to an embodiment of the present application;
FIG. 10 is an example of an image decoding scene flow diagram provided by an embodiment of the present application;
FIG. 11 is a schematic diagram of an image decoding scenario according to an embodiment of the present application;
fig. 12 is a schematic diagram of a video decoding process according to an embodiment of the present application;
FIG. 13 is a schematic diagram of an image encoding apparatus according to an embodiment of the present application;
fig. 14 is a schematic view of an image decoding apparatus according to an embodiment of the present application;
fig. 15 is a schematic structural diagram of a computer device according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
If the data of the object (such as the user) needs to be collected in the application, before and during the collection, a prompt interface or a popup window is displayed, wherein the prompt interface or the popup window is used for prompting the user to collect certain data currently, and the relevant step of data acquisition is started only after the confirmation operation of the user to the prompt interface or the popup window is obtained, otherwise, the process is ended. Moreover, the acquired user data, such as the image to be encoded, can be used in reasonable legal scenes or applications. Optionally, in some scenarios where user data is required but not authorized by the user, authorization may be requested from the user, and the user data may be reused when authorization passes. Wherein the use of the user data complies with relevant regulations of legal regulations.
Optionally, the application can adopt cloud technology and artificial intelligence technology to realize the scheme in the application and improve the effect of data processing.
For example, the data management effect can be improved by storing and processing data through a cloud storage technology. The cloud storage (cloud storage) is a new concept that extends and develops in the concept of cloud computing, and the distributed cloud storage system (hereinafter referred to as a storage system for short) refers to a storage system that provides data storage and service access functions for the outside through aggregation of a large number of storage devices (storage devices are also referred to as storage nodes) of different types in a network through application software or application interfaces by means of functions such as cluster application, grid technology, and distributed storage file systems.
At present, the storage method of the storage system is as follows: when creating logical volumes, each logical volume is allocated a physical storage space, which may be a disk composition of a certain storage device or of several storage devices. The client stores data on a certain logical volume, that is, the data is stored on a file system, the file system divides the data into a plurality of parts, each part is an object, the object not only contains the data but also contains additional information such as a data Identification (ID) and the like, the file system writes each object into a physical storage space of the logical volume, and the file system records storage position information of each object, so that when the client requests to access the data, the file system can enable the client to access the data according to the storage position information of each object.
The process of allocating physical storage space for the logical volume by the storage system specifically includes: physical storage space is divided into stripes in advance according to the set of capacity measures for objects stored on a logical volume (which measures tend to have a large margin with respect to the capacity of the object actually to be stored) and redundant array of independent disks (RAID, redundant Array of Independent Disk), and a logical volume can be understood as a stripe, whereby physical storage space is allocated for the logical volume.
For example, when there is a large amount of data to be processed, such as when there is a large number of images to be encoded, a large data technique may be used to perform encoding and decoding processing on the images to be encoded. Big data (Big data) refers to a data set which cannot be captured, managed and processed by a conventional software tool within a certain time range, and is a massive, high-growth-rate and diversified information asset which needs a new processing mode to have stronger decision-making ability, insight discovery ability and flow optimization ability.
Alternatively, artificial intelligence techniques may be employed to effect encoding and decoding of the image to be encoded. Among these, artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision, for example, in the application, the machines have the capability of encoding and decoding images through artificial intelligence technology.
The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning, automatic driving, intelligent traffic and other directions.
Machine Learning (ML) is a multi-domain interdisciplinary, involving multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, etc. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, confidence networks, reinforcement learning, transfer learning, induction learning, teaching learning, and the like. For example, the key point analysis network, the optical flow analysis model, the image generation network, and the like mentioned in the present application.
In the embodiment of the present application, please refer to fig. 1, fig. 1 is a network interaction architecture diagram of image encoding and decoding provided in the embodiment of the present application, as shown in fig. 1, an encoding device 101 may obtain a video stream to be encoded, sequentially take image frames included in the video stream to be encoded as images to be encoded, and encode the images to be encoded based on the image frame types of the images to be encoded, so as to obtain an encoded code stream of the images to be encoded. The encoding device 101 may respond to a video encoding request sent by any one decoding device (such as the decoding device 102a, the decoding device 102b, or the decoding device 102c, etc.), obtain a video stream to be encoded carried by the video encoding request, when obtaining an encoded code stream of an image to be encoded, send the encoded code stream to a corresponding decoding device, or when obtaining an encoded code stream respectively corresponding to each image frame included in the video stream to be encoded, encapsulate the encoded code stream respectively corresponding to each image frame into a video code stream, and send the video code stream to the corresponding decoding device; alternatively, the encoding device 101 may respond to a video acquisition request for a video stream to be encoded, acquire the video stream to be encoded based on the video acquisition request, and send a result of encoding the video stream to be encoded to a corresponding decoding device; alternatively, the encoding device 101 may encode the obtained video stream to be encoded, and send the result after encoding the video stream to be encoded to an associated decoding device, where the associated decoding device refers to a decoding device having an obtaining requirement of the video stream to be encoded, or a decoding device where the encoding device 101 needs to push the video stream to be encoded, and the specific scenario is not limited herein. The decoding device has an image display function, and the decoding device may be a mobile phone (such as the decoding device 102 c), a notebook computer (such as the decoding device 102 b), or an in-vehicle device (such as the decoding device 102 a), and the decoding device 102a refers to a device located in the vehicle 103.
Specifically, referring to fig. 2, fig. 2 is a schematic diagram of an image encoding scene according to an embodiment of the present application. As shown in fig. 2, the encoding device may detect an image frame type of an image 201 to be encoded. If the image frame type is the key frame type, identifying image key points of the image 201 to be encoded, encoding the image 201 to be encoded, generating encoded data 202 corresponding to the image 201 to be encoded, forming code stream information by the image key points and the image frame type, and packaging the encoded data 202 and the code stream information into an encoded code stream of the image 201 to be encoded; the key frame type is used to indicate that the image to be encoded is the first image frame in the video stream to be encoded, or the image frame type cannot be generated based on the image key points, that is, when the image to be encoded with the image frame type being the key frame type is reconstructed and decoded based on the image key points, the difference between the image to be encoded and the original image (i.e. the image to be encoded) is large, and accurate reconstruction cannot be performed. If the image frame type is the conventional frame type, identifying image key points of the image to be encoded, forming code stream information by the image key points and the image frame type, and packaging the code stream information into an encoded code stream of the image to be encoded. When the video stream to be encoded is processed, the encoding mode of the corresponding image frame can be determined based on the image frame type of the image frame in the video stream to be encoded, and in general, each image frame in one video stream to be encoded is continuously changed, so that the number of image frames with the image frame type being the key frame type in the video stream to be encoded is generally smaller, the image frames with the key frame type can be directly encoded, the image encoding is to directly encode the image frames, so that the information loss of the encoded code stream is less, and the decoding accuracy of the image can be ensured; the other image frame types are the image frames of the conventional frame types, the image key points and the like of the image frames can be directly packaged, even coding processing is not needed, and the order of magnitude of code stream information of the image key points and the like is small, so that the image coding only occupies less resources, the efficiency and the accuracy of the image coding and decoding can be improved, and the resources consumed by the image coding and decoding can be saved.
It is understood that the encoding device or the decoding device mentioned in the embodiments of the present application may be a computer device, and the computer device in the embodiments of the present application includes, but is not limited to, a terminal device or a server. In other words, the computer device may be a server or a terminal device, or may be a system formed by the server and the terminal device. The above-mentioned terminal device may be an electronic device, including but not limited to a mobile phone, a tablet computer, a desktop computer, a notebook computer, a palm computer, a vehicle-mounted device, an augmented Reality/Virtual Reality (AR/VR) device, a head-mounted display, a smart television, a wearable device, a smart speaker, a digital camera, a camera, and other mobile internet devices (mobile internet device, MID) with network access capability, or a terminal device in a scene such as a train, a ship, or a flight. The servers mentioned above may be independent physical servers, or may be server clusters or distributed systems formed by a plurality of physical servers, or may be cloud servers that provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, vehicle-road collaboration, content distribution networks (Content Delivery Network, CDN), and basic cloud computing services such as big data and artificial intelligence platforms.
Optionally, the data related to the embodiment of the present application may be stored in a computer device, or may be stored based on a cloud storage technology or a blockchain network, and the like, which is not limited herein.
Further, referring to fig. 3, fig. 3 is a flowchart of a method for encoding an image according to an embodiment of the present application. As shown in fig. 3, an image to be encoded is described as an example, in other words, in the method embodiment described in fig. 3, the image encoding process includes the steps of:
step S301 detects an image frame type of an image to be encoded.
In the embodiment of the application, the encoding device can acquire the video stream to be encoded, and acquire N image frames forming the video stream to be encoded, namely, the video stream to be encoded comprises N image frames, the encoding device can sequentially take the N image frames as images to be encoded, detect the image frame type of the images to be encoded, and encode the images to be encoded based on the image frame type to obtain the encoding code stream corresponding to the images to be encoded. The image frame type may be considered to include a key frame type and a normal frame type, where the key frame type is used to indicate that an image to be encoded has a large degree of change in a video stream to be encoded, that is, an image frame in which a mutation occurs in the video stream to be encoded, or a first image frame in the video stream to be encoded, and simply, the key frame type is used to indicate that the image to be encoded cannot be directly generated based on an image key point, and a difference between an image reconstructed through the image key point and the image to be encoded is large, or an image reconstruction abnormality may occur, or an image without reference may occur. The image frames of the conventional frame type refer to image frames other than those of the key frame type in the video stream to be encoded.
Specifically, the key frame type includes a first frame type and a predicted frame type; n is a positive integer, and the first frame type is used for representing that the image to be encoded is the first image frame in the video stream to be encoded; the predicted frame type is used for representing that the image to be encoded is an image frame with larger difference between adjacent image frames when mutation occurs or partial region shielding occurs in the image in the video stream to be encoded. Alternatively, an image frame whose image frame type is the first frame type may be referred to as an I frame, and an image frame whose image frame type is the predicted frame type may be referred to as a P frame. Specifically, when detecting the image frame type of the image to be encoded, if the image to be encoded is the first image frame in the video stream to be encoded, determining that the image frame type of the image to be encoded is the first frame type. If the image to be encoded is not the first image frame in the video stream to be encoded, acquiring reference frames in the N image frames, and determining the image frame type of the image to be encoded based on the reference frames.
Specifically, a reference frame in the N image frames may be acquired, a reference key point in the reference frame and an image key point in the image to be encoded may be acquired, and a reconstructed image may be generated based on the image key point and the reference key point. The reference frame is the first image frame, or the image frame of the N image frames, where the image frame type is the image frame of the predicted frame type that is located before the image to be encoded, that is, the first image frame of the video stream to be encoded may be always used as the reference frame, and encoding and decoding may be performed on the video stream to be encoded, or the reference frame of the video stream to be encoded may be updated based on the image frame of the predicted frame type, and encoding and decoding may be performed on the video stream to be encoded. Further, if the image difference between the reconstructed image and the image to be encoded is greater than or equal to the image anomaly threshold, the image to be encoded cannot be directly reconstructed based on the image key points, and the image frame type of the image to be encoded can be determined as the predicted frame type; if the image difference degree between the reconstructed image and the image to be encoded is smaller than the image anomaly threshold, the image to be encoded can be reconstructed by the image key points, the reconstruction effect is good, and the image frame type of the image to be encoded can be determined to be the conventional frame type. When a reconstructed image is generated based on the image key points and the reference key points, optical flow information of the image to be encoded relative to the reference frame can be determined based on the image key points and the reference key points, wherein the optical flow information is used for representing image motion information from the reference frame to the image to be encoded, namely, the change condition of pixel points in the reference frame before corresponding points in the image to be encoded can be used for representing the change of the image to be encoded on the basis of the reference frame; updating the reference frame based on the optical flow information to obtain a reconstructed image corresponding to the image to be encoded.
For example, referring to fig. 4, fig. 4 is a schematic view of an image reconstruction scene according to an embodiment of the present application. As shown in fig. 4, the encoding apparatus may acquire a reference frame 4011 among N image frames, acquire a reference key point 4021 among the reference frame 4011, and an image key point 4022 among the image 4012 to be encoded. The key points refer to points of key areas (Region of Interest, ROI) of a target object which is mainly changed in N image frames in a video stream to be encoded, if the target object is a human face and the key areas are five sense organs forming a human face, then a reference key point 4021 in a reference frame 4011 is a pixel point corresponding to the five sense organs forming the human face in the reference frame 4011, and an image key point 4022 in an image 4012 to be encoded is a pixel point corresponding to the five sense organs forming the human face in the image 4012 to be encoded; if the target object is animal a, the key region is a region constituting animal a, then the reference key point 4021 in the reference frame 4011 is a pixel point corresponding to an external contour constituting animal a in the reference frame 4011, and the image key point 4022 in the image 4012 to be encoded is a pixel point corresponding to an external contour constituting animal a in the image 4012 to be encoded. Further, optical flow analysis may be performed on the image key points 4022 and the reference key points 4021 to obtain optical flow information 403 of the image to be encoded relative to the reference frame; the reference frame 4011 is updated based on the optical flow information 403, and a reconstructed image 404 corresponding to the image to be encoded is obtained.
Specifically, when updating the reference frame based on the optical flow information to obtain a reconstructed image corresponding to the image to be encoded, the reference frame can be subjected to convolution processing to obtain an image convolution feature, and the optical flow information is adopted to perform feature conversion on the image convolution feature to obtain a first image feature. Residual processing is carried out on the first image feature to obtain a second image feature; and carrying out convolution processing on the second image characteristics to generate a reconstructed image corresponding to the image to be encoded. Optionally, the reference frame may be subjected to convolution processing to obtain an image convolution feature, and the image convolution feature is subjected to downsampling processing to obtain a first sampling feature; performing feature conversion on the first sampling feature by adopting optical flow information to obtain a first image feature, so that a reference frame can be converted into an image to be encoded in a feature domain, further, residual processing can be performed on the first image feature to obtain a residual feature, for example, the first image feature can be subjected to 8 residual blocks (ResBlock) with the same input and output dimensions to obtain the residual feature, and the like, without limitation; and carrying out convolution processing on the residual error characteristics to obtain residual error convolution characteristics, and carrying out up-sampling processing on the residual error convolution characteristics to generate a reconstructed image corresponding to the image to be coded. The up-sampling process is to repair the image size change caused by the previous down-sampling process, so as to further improve the image reconstruction accuracy.
Optionally, in generating the reconstructed image, reference may be made to fig. 5, where fig. 5 is a schematic view of a model reconstruction scene provided by an embodiment of the present application. As shown in fig. 5, a reference frame and an image to be encoded may be respectively input into a key point analysis network 501 to perform key point analysis, to determine a reference key point in the reference frame and an image key point in the image to be encoded; inputting the reference key points and the image key points into an optical flow analysis model 502 to perform optical flow analysis to obtain optical flow information of the image to be coded relative to a reference frame; the reference frame and the optical flow information are input to the image generation network 503 to generate a reconstructed image. The key point analysis network 501 may be a network similar to a U-Net structure, or a BiseNet network, etc., and is not limited herein, and input is a Red Green Blue (RGB) three-channel image (such as an image to be encoded and a reference frame), and coordinate information of a key point, such as coordinate information of an image key point in the image to be encoded and coordinate information of a reference key point in the reference frame, etc., are output. The optical flow analysis model 502 may be a network with a U-Net structure, and inputs a reference key point of a reference frame and an image key point of an image to be encoded to obtain optical flow information of the image to be encoded relative to the reference frame, where the optical flow information is used to represent a motion vector from the reference frame to the image to be encoded, and the optical flow information may include, but is not limited to, a mapping relationship from the reference key point to the image key point, where the mapping relationship is used to represent a mapping coefficient or a mapping matrix from the reference key point to the image key point. The image generation network 503 may employ a network of pix2pixHD structure, or a Diffusion Model (Diffusion Model), or the like, without limitation. That is, the model types and the topology structures of the networks mentioned above are not limited, and may be replaced by other effective new model structures, or the model structures of the networks may be changed, expanded or simplified according to the requirements on the model expression capability of the networks and the owned computing resource conditions, which are not limited herein.
Wherein each network is obtained through training, and optionally, the training process of each network can be realized in the encoding equipment; the method can also be implemented in other computer equipment, and the coding equipment directly acquires the trained network from the computer equipment, such as a key point analysis network, an optical flow analysis model, an image generation network and the like.
Specifically, when training each network, the first image sample and the second image sample can be respectively input into an initial key point analysis network to perform key point analysis, and the first sample key point in the first image sample and the second sample key point in the second image sample are determined; the generation of the first sample key point and the second sample key point can be referred to the generation process of the image key point or the reference key point. Inputting the first sample key points and the second sample key points into an initial optical flow analysis model to perform optical flow analysis, so as to obtain sample optical flow information of the second image sample relative to the first image sample; the first image sample and sample optical flow information are input into an initial image generation network to generate a first sample reconstructed image. Further, model loss can be constructed according to the first image sample reconstructed image and the second image sample, and the model loss is adopted to carry out parameter adjustment on the initial key point analysis network, the initial optical flow analysis model and the initial image generation network until the parameters are converged, so that the key point analysis network corresponding to the initial key point analysis network, the optical flow analysis model corresponding to the initial optical flow analysis model and the image generation network corresponding to the initial image generation network are obtained. Wherein d loss functions can be constructed according to the first image sample reconstructed image and the second image sample, d is a positive integer, and any one loss function or any plurality of loss functions in the d loss functions can be adopted to generate model loss. For example, the first sample reconstructed image is denoted as I out The second image sample is denoted as I gt The d loss functions may include, but are not limited to, L 1 Loss, multiscale image similarity loss function (Multi-scale image similarity loss function, MS-SSIM), perceptual loss (Visual Geometry Group, VGG), object-based perceptual loss (Object), and generative loss (L) gan ) Etc., wherein a loss function other than the generation loss may be composed into the reconstruction loss. Alternatively, the model loss includes a reconstruction loss, and the encoding device may combine any one or more of the d loss functions other than the generation loss to form a reconstruction loss, and reconstruct the reconstruction lossAnd determining as model loss, optionally, when the reconstruction loss consists of a plurality of loss functions, if the number of the loss functions is recorded as h, and h is a positive integer, directly determining the sum of the h loss functions as the reconstruction loss, or acquiring loss weights corresponding to the h loss functions respectively, and carrying out weighted summation on the h loss functions based on the loss weights corresponding to the h loss functions respectively to obtain the reconstruction loss.
For example, in one possible way, the reconstruction loss can be shown by equation (1):
l is as shown in formula (1) rec For representing reconstruction losses, alpha, beta, gamma and delta, etc. for representing loss weights of corresponding loss functions, e.g. alpha for representing L 1 Loss weight of loss, etc. The loss weight corresponding to each loss function may be a preset empirical value, for example, α, β, γ, and δ may be 100, 10, and 10, respectively, or may be a manually assigned value.
Alternatively, the generated loss may be determined as a model loss, wherein the generated loss may be represented by referring to formula (2):
L gan =(1-D(I out )) 2
l is as shown in formula (2) gan For representing the generation loss, D () for representing the image arbiter.
Alternatively, any one or more of the d loss functions other than the generation loss may be combined into a reconstruction loss, and the reconstruction loss and the generation loss may be combined into a model loss. The model loss can be shown by referring to formula (3):
L G =L rec +L gan
l is as shown in formula (3) G For representing model loss, that is, model loss may be as shown in formula (3) or L G =L rec Or L G =L gan Etc.
Optionally, when the model loss includes a generation loss, constructing the model loss according to the first sample reconstructed image and the second image sample, which may be that the first sample reconstructed image is input into an image discriminator to perform image detection, and the output image quality corresponding to the first sample reconstructed image is determined; the generation loss is generated based on the output image quality, and may be specifically expressed as formula (2).
The image discriminator may be trained by the encoding device or may be a trained image discriminator obtained from a computer device. Specifically, in the training process of the image discriminator, a second sample reconstruction image corresponding to a third image sample is obtained, the second sample reconstruction image and the third image sample are respectively input into the initial image discriminator for image detection, and the first sample quality corresponding to the third image sample and the second sample quality corresponding to the second sample reconstruction image are determined; generating an image discrimination loss according to the first sample quality and the second sample quality, and performing parameter adjustment on the initial image discriminator by adopting the image discrimination loss until the image discriminator is obtained. One possible image discrimination loss can be seen in equation (4):
L D =D(I 1 ) 2 +(1-D(I 2 )) 2
l is as shown in formula (4) D For representing image discrimination loss, I 1 For representing a second sample reconstructed image, I 2 For representing a third image sample.
Optionally, the above-mentioned networks and image discriminators may be jointly trained based on the first sample image and the second sample image, that is, the initial key point analysis network, the initial optical flow analysis model, the initial image generation network and the initial image discriminator may be parameter-adjusted based on the model loss and the image discrimination loss until the parameters converge to obtain the key point analysis network corresponding to the initial key point analysis network, the optical flow analysis model corresponding to the initial optical flow analysis model, the image generation network corresponding to the initial image generation network, and the initial image discriminator A corresponding image discriminator, etc. As shown in fig. 5, a first sample reconstructed image corresponding to the reconstructed image and a second sample image corresponding to the image to be encoded may be input into the initial image discriminator 504, and parameter adjustment may be performed on the initial image discriminator 504 based on the generation result. Alternatively, the training process may be performed using a training batch and a training learning rate, for example, the training batch is 64, and the initial value of the training learning rate is 2×10 -4 The training learning rate is updated to obtain a new training learning rate every iteration training period (e.g., 50 rounds), for example, 50 rounds per iteration, the training learning rate is updated to 1/10 of the original training learning rate, and in this example, 64 sets of sample images (i.e., including a first sample image corresponding to a reference frame and a second sample image corresponding to an image to be encoded) are used as one round, and parameter adjustment is performed on each of the above mentioned networks, and the training learning rate is adjusted once every iteration 50 rounds until a trained network is obtained. Optionally, when parameter adjustment is performed on each network, parameter adjustment may be performed directly based on training learning rate, or an AdamW optimizer may be used to perform parameter adjustment on each network, for example, each parameter in the AdamW optimizer may be set to be β respectively 1 =0.9,β 2 =0.999,ε=10 -8 Etc. The parameter convergence condition includes, but is not limited to, the model loss and the image discrimination loss hardly changing, or the result obtained by processing the verification sample set basically stops changing, and the like, and the parameter convergence condition is not limited thereto, and alternatively, a model that performs best on the verification set may be selected as a final model. If, during iteration of each network, the initial key point analysis network is assumed to be subjected to 100 iterations, and the result obtained after processing the verification sample set basically stops changing, where under the parameter updated by the 98 th iteration, the initial key point analysis network performs best, and the initial key point analysis network obtained by the 98 th iteration can be determined as the key point analysis network.
Further, a first key region in the image to be encoded and a second key region in the reconstructed image can be obtained, and structural similarity analysis is performed on the first key region and the second key region to obtain the image difference degree between the reconstructed image and the image to be encoded. An image frame type of an image to be encoded is determined based on the image difference. Alternatively, if the image difference between the reconstructed image and the image to be encoded is smaller than the image anomaly threshold, the image frame type of the image to be encoded is determined to be a conventional frame type. Or if the image difference degree between the reconstructed image and the image to be encoded is smaller than the image anomaly threshold value, acquiring continuous M image frames comprising the image to be encoded from N image frames; m is a positive integer; if the image frame with the image performance being greater than or equal to the performance threshold exists in the M image frames, determining that the image frame type of the image to be encoded is a conventional frame type; if the image expressions of the M image frames are smaller than the expression threshold value, determining that the image frame type of the image to be encoded is a predicted frame type.
Alternatively, the object detection may be performed on the image to be encoded first, and then the image difference detection may be performed on the image to be encoded based on the object detection result. Specifically, object detection can be performed on an image to be encoded, and if a target object is detected in the image to be encoded, a process of acquiring a reference key point in a reference frame and an image key point in the image to be encoded is performed, and the image difference degree between the image to be encoded and a reconstructed image is detected; if the target object is not detected in the image to be encoded, determining that the image frame type of the image to be encoded is a predicted frame type.
For example, referring to fig. 6, fig. 6 is a schematic diagram of an alternative image frame type determination scene provided by an embodiment of the present application. As shown in fig. 6, the encoding apparatus may perform object detection on an image 601 to be encoded, and if a target object is not detected, determine that an image frame type of the image to be encoded is a predicted frame type. If the target object is detected, a first key region in the image 601 to be encoded and a second key region in the reconstructed image 602 are identified, for example, interpolation processing (such as cubic interpolation processing) may be performed on the image 601 to be encoded to reduce the image resolution, the first key region is extracted from the image after the interpolation processing, where the first key region includes image key points, which reduces the data processing amount, and similarly, the second key region in the reconstructed image 602 may be obtained. Further, structural similarity (Structral Similarity, SSIM) analysis is performed on the first key region and the second key region, and an SSIM index can be used to measure the generation effect of the reconstructed image 602, so as to obtain the image difference degree between the reconstructed image and the image to be encoded. If the image difference degree is smaller than the image anomaly threshold value, acquiring continuous M image frames comprising the image to be encoded from N image frames; if the image frames with the performance greater than or equal to the performance threshold value exist in the M image frames, determining that the image frame type of the image to be encoded is a conventional frame type; if the image expressions of the M image frames are smaller than the expression threshold value, determining that the image frame type of the image to be encoded is a predicted frame type. And if the image difference degree is larger than or equal to the image anomaly threshold value, determining that the image frame type of the image to be encoded is a predicted frame type.
Step S302, if the image frame type is the key frame type, identifying the image key points of the image to be encoded, encoding the image to be encoded, generating encoded data corresponding to the image to be encoded, and packaging the image key points, the image frame type of the image to be encoded and the encoded data into an encoded code stream of the image to be encoded.
In the embodiment of the application, the key frame type is used for representing that the image to be decoded is the first image frame of the video stream to be encoded, or the image frame type which cannot be generated based on the image key points.
Specifically, the image to be encoded can be encoded to generate encoded data corresponding to the image to be encoded; identifying image key points of an image to be encoded, and generating code stream information of the image to be encoded based on the image key points and the image frame types of the image to be encoded; and packaging the coded data and the code stream information of the image to be coded into a coded code stream of the image to be coded. Specifically, when generating encoded data of an image to be encoded, in an encoding mode (1), adjacent images of the image to be encoded in a video stream to be encoded can be obtained, a first image residual error between the image to be encoded and the adjacent images is obtained, the first image residual error is encoded, and encoded data corresponding to the image to be encoded is generated; or in the encoding mode (2), a reference frame in the video stream to be encoded can be acquired, a second image residual error between the image to be encoded and the reference frame is acquired, the second image residual error is encoded, and encoded data corresponding to the image to be encoded is generated; alternatively, in an encoding mode (3), the image to be encoded may be encoded, so as to generate encoded data corresponding to the image to be encoded.
In step S303, if the image frame type is the normal frame type, the image key points of the image to be encoded are identified, and the image key points and the image frame type are encapsulated into the encoding code stream of the image to be encoded.
In the embodiment of the application, if the image frame type is the conventional frame type, the encoding device can directly identify the image key points of the image to be encoded, form the code stream information of the image to be encoded by the image key points and the image frame type, and take the code stream information as the encoding code stream of the image to be encoded. The image key points are used for carrying out combined decoding with the reference key points of the reference frame to obtain a decoded image corresponding to the image to be encoded. Specifically, the image key points are used for being combined with the reference key points of the reference frame to determine the optical flow information of the image to be coded relative to the reference frame; the optical flow information of the image to be coded relative to the reference frame is used for updating the reference frame to obtain a decoded image corresponding to the image to be coded. The reference frame may be the first image frame in the video stream to be encoded, or may be updated based on the image frame of the key frame type in the video stream to be encoded.
In the above manner, reference may be made to fig. 7, and fig. 7 is a schematic diagram of a video stream encoding manner according to an embodiment of the present application. As shown in fig. 7, it is assumed that an image frame of which the image frame type is the first frame type is referred to as an I frame, an image frame of which the image frame type is a normal frame type is referred to as an AI frame, and an image frame of which the image frame type is a predicted frame type is referred to as a P frame. Taking a reference frame as a first image frame in a video stream to be encoded all the time as an example, the encoding device can encode the I frame to obtain encoded data of the I frame, and encapsulate the code stream information and the encoded data of the I frame into an encoded code stream corresponding to the I frame. For the AI frame, the image key points of the AI frame can be obtained, the image key points of the AI frame and the image frame type form code stream information, and the code stream information is used as the encoding code stream of the AI frame. Aiming at the P frame, the P frame can be coded to obtain the coded data of the P frame; or, carrying out coding treatment on the residual error of the P frame relative to the I frame to obtain coded data of the P frame; and packaging the coded data and the code stream information of the P frame into a coded code stream corresponding to the P frame.
Referring specifically to fig. 8, fig. 8 is a schematic flow chart of an image coding scene according to an embodiment of the present application. As shown in fig. 8, an image frame type of an image to be encoded may be acquired. If the image frame type of the image to be encoded is the first frame type, generating code stream information of the image to be encoded, encoding the image to be encoded to obtain encoded data, and packaging the code stream information and the encoded data into an encoded code stream of the image to be encoded; if the image frame type of the image to be encoded is the conventional frame type, generating code stream information of the image to be encoded, and taking the code stream information as an encoding code stream of the image to be encoded; if the image frame type of the image to be encoded is the predicted frame type, generating code stream information of the image to be encoded, encoding the image to be encoded to obtain encoded data, and packaging the code stream information and the encoded data into an encoded code stream of the image to be encoded. Specifically, referring to fig. 9, fig. 9 is a schematic diagram of a video stream encoding process according to an embodiment of the present application. As shown in fig. 9, the process may include the steps of:
Step S901, a video stream to be encoded is acquired, and the i-th image frame is determined as a reference frame.
In the embodiment of the application, the encoding device may acquire the video stream to be encoded, acquire N image frames that constitute the video stream to be encoded, initialize i, that is, set i to 1, i to a positive integer less than or equal to N, and determine the i-th image frame as the reference frame. At present, the initialized value of i may be other values, such as 0, where i is a positive integer less than N, such as 2, where i is a positive integer less than or equal to (n+1), and so on.
In step S902, the i-th image frame is determined as the image to be encoded.
In the embodiment of the present application, the i-th image frame is determined as the image to be encoded, and step S903 is performed.
In step S903, the image frame type of the image to be encoded is detected.
In the embodiment of the present application, the specific description of step S301 in fig. 3 may be referred to, and will not be described herein. If the image frame type of the image to be encoded is the first frame type, step S904 is executed; if the image frame type of the image to be encoded is the normal frame type, step S905 is performed; if the image frame type of the image to be encoded is the predicted frame type, step S906 is performed.
Step S904, if the image frame type is the first frame type, identifying the image key point of the image to be encoded, encoding the image to be encoded, generating encoded data corresponding to the image to be encoded, and packaging the image key point, the image frame type of the image to be encoded and the encoded data into an encoded code stream of the image to be encoded.
In the embodiment of the application, if the image frame type is the first frame type, the image key points of the image to be encoded are identified, and the code stream information of the image to be encoded is generated based on the image key points and the image frame type. And carrying out coding processing on the image to be coded to generate coding data corresponding to the image to be coded. And packaging the code stream information and the coded data into a coded code stream of the image to be coded. The stream information may be considered as a header in the encoded stream and may be considered as supplemental enhancement information (Supplemental Enhancement Information, SEI). The image key points may include coordinate information of the key points in the image to be encoded, and optionally, hexadecimal 0x00 may be used to represent the first frame type. Optionally, the code stream information may further include, but is not limited to, coding parameters for the image to be coded, such as a sequence parameter set (Sequence Parameter Set, SPS) and an image parameter set (Picture Parameter Set, PPS), where the SPS is used to represent global parameters of the video stream to be coded, including, but not limited to, coding level and resolution; PPS is used to store parameters and the like required for decoding encoded data. When the image to be encoded is encoded and encoded data corresponding to the image to be encoded is generated, any image encoding method such as a color encoding method (YUV) may be adopted to encode the image to be encoded. Step S908 is further performed.
In step S905, if the image frame type is the normal frame type, the image key points of the image to be encoded are identified, and the image key points and the image frame type are encapsulated into the encoding code stream of the image to be encoded.
In the embodiment of the application, if the image frame type is a conventional frame type, the image key points of the image to be encoded are identified, the image key points and the image frame type form code stream information, the code stream information is directly packaged into the code stream of the image to be encoded, and optionally, hexadecimal 0x01 can be adopted to represent the conventional frame type and the like. Step S908 is further performed.
Step S906, if the image frame type is the predicted frame type, identifying the image key point of the image to be encoded, encoding the image to be encoded, generating encoded data corresponding to the image to be encoded, and packaging the image key point, the image frame type of the image to be encoded and the encoded data into an encoded code stream of the image to be encoded.
In the embodiment of the application, if the image frame type is the predicted frame type, the image key points of the image to be encoded are identified, and the code stream information of the image to be encoded is generated based on the image key points and the image frame type. And carrying out coding processing on the image to be coded to generate coding data corresponding to the image to be coded. And packaging the code stream information and the coded data into a coded code stream of the image to be coded. Specifically, reference may be made to each encoding mode in step S302 in fig. 3, and detailed description thereof will be omitted herein. Alternatively, the predicted frame type may be represented by hexadecimal 0x 02. Alternatively, in one case, the first image frame in the video stream to be encoded may be used as the reference frame all the time, at this time, step S908 may be further performed; in one case, the reference frame may be changed, at which point step S907 may be further performed.
In step S907, the image to be encoded is determined as a reference frame.
In the embodiment of the application, the image to be coded is determined as a reference frame, namely, the image to be coded is used as a current reference frame.
Step S908, detect i < N.
In the embodiment of the present application, taking the initial value of i as 1 as an example, detecting whether i is smaller than N, if i is smaller than N, indicating that there is still an uncoded image frame, step S909 may be performed; if i is greater than or equal to N, it indicates that the encoding of the image frames in the video stream to be encoded has been completed, step S910 may be performed.
Step S909, i++.
In the embodiment of the present application, i++, that is, the next image frame is processed, the process returns to step S902.
Step S910, obtain a video code stream of the video stream to be encoded.
In the embodiment of the application, the N image frames are respectively corresponding to the coding code streams to form the video code stream of the video stream to be coded.
Optionally, the encoding device may send the video code stream to the decoding device when obtaining the video code stream of the video stream to be encoded; one or more code streams may be sent to the decoding device at a time until the code streams corresponding to the N image frames are sent to the decoding device, or the like, which is not limited herein.
In the embodiment of the application, the host can detect the image frame type of the image to be encoded; if the image frame type is the key frame type, identifying image key points of the image to be encoded, encoding the image to be encoded, generating encoded data corresponding to the image to be encoded, and packaging the image key points, the image frame type of the image to be encoded and the encoded data into an encoded code stream of the image to be encoded; the key frame type is used for representing that the image to be decoded is the first image frame of the video stream to be encoded, or the image frame type which cannot be generated based on the image key points; if the image frame type is the conventional frame type, identifying the image key points of the image to be encoded, and packaging the image key points and the image frame type into an encoding code stream of the image to be encoded. Through the above process, the coding and decoding processing of the image to be coded can be realized based on the image frame type of the image to be coded, wherein in general cases (namely the conventional frame type), only key points of the image are required to be coded, the code rate saving of orders of magnitude can be realized, and less bandwidth occupation is required under the same video quality; on the basis, when the image frame type of the image to be encoded is the key frame type, the image to be encoded can not be directly reconstructed based on the image key points, so that the image to be encoded can be encoded to make up for possible encoding abnormality in the manner of encoding based on the key points, and the number of the images of the key frame type is generally not too large, so that the data amount required to be processed for image encoding is not increased too much, that is, the improvement of the efficiency of image encoding can still be realized, and the accuracy of image encoding can be further improved.
Further, referring to fig. 10, fig. 10 is an example of an image decoding scene flow diagram according to an embodiment of the present application. As shown in fig. 10, the decoding apparatus may acquire an encoded code stream from which an image frame type is acquired. If the image frame type is the first frame type, decoding the coded data to obtain a decoded image; if the image frame type is the conventional frame type, decrypting the code stream information to obtain a decoded image; and if the image frame type is the predicted frame type, decoding the encoded data to obtain a decoded image. Referring specifically to fig. 11, fig. 11 is a schematic view of an image decoding scene according to an embodiment of the present application. As shown in fig. 11, the process may include the steps of:
step S1101, obtaining an image frame type from the code stream corresponding to the image to be encoded.
In the embodiment of the application, the image frame type can be obtained from the code stream information of the code stream.
In step S1102, if the image frame type is the key frame type, the encoded data in the encoded code stream is obtained, and the encoded data is decoded to obtain a decoded image corresponding to the image to be encoded.
In the embodiment of the application, if the image frame type is a key frame type, the coding code stream is obtained by packaging the image key points, the image frame type and the coding data of the image to be coded; the key frame type is used to indicate that the image to be decoded is the first image frame of the video stream to be encoded, or an image frame type that cannot be generated based on the image key points.
In the encoding mode (1), the encoded data can be subjected to decoding processing to obtain a first decoding residual error, an adjacent decoding image adjacent to the encoded code stream is obtained, the first decoding residual error is fused on the basis of the adjacent decoding image, and a decoding image corresponding to the image to be encoded is generated; or, in the encoding mode (2), the encoded data can be decoded to obtain a second decoding residual error, a reference frame is obtained, and the second decoding residual error is fused on the basis of the reference frame to generate a decoded image corresponding to the image to be encoded; alternatively, in the encoding mode (3), the encoded data may be decoded to obtain a decoded image corresponding to the image to be encoded.
In step S1103, if the image frame type is the normal frame type, a reference frame in the video stream to be encoded is obtained, an image key point in the encoded code stream is obtained, and a decoded image is generated based on the reference frame and the image key point.
In the embodiment of the application, the coding code stream is obtained by packaging based on the image key points and the image frame types of the image to be coded. Specifically, a reference key point of a reference frame can be obtained, and optical flow information of an image to be coded relative to the reference frame is determined according to the reference key point and the image key point; updating the reference frame based on the optical flow information to obtain a decoded image. The generation process of the decoded image can be referred to in step S301 of fig. 3, and the generation process of the reconstructed image is not described herein.
Further, referring to fig. 12, fig. 12 is a schematic diagram of a video decoding process according to an embodiment of the present application. As shown in fig. 12, the process may include the steps of:
step S1201, an image frame type is acquired from the i-th encoded code stream.
In the embodiment of the present application, i is initialized, and the initialization of i can be referred to as initializing i in fig. 9. The image frame type may be obtained from the ith encoded code stream. If the image frame type is the first frame type, executing step S1202; if the image frame type is the normal frame type, executing step S1204; if the image frame type is the predicted frame type, step S1205 is executed.
Step S1202, if the image frame type is the first frame type, obtaining the encoded data in the encoded code stream, and performing decoding processing on the encoded data to obtain a decoded image i corresponding to the image to be encoded.
In the embodiment of the application, if the image frame type is the first frame type, the encoded data can be obtained from the encoded code stream, and the encoded data is decoded to obtain the decoded image i corresponding to the image to be encoded. Specifically, the coding parameters can be obtained from the code stream information of the coding code stream, and the coding parameters are adopted to decode the coding data, so as to obtain the decoded image i corresponding to the image to be coded.
Step S1203 determines the decoded image i as a reference frame.
In the embodiment of the present application, the decoded image i is determined as the reference frame, and step S1207 is further performed.
In step S1204, if the image frame type is the normal frame type, a reference frame in the video stream to be encoded is obtained, an image key point in the encoded code stream is obtained, and a decoded image i is generated based on the reference frame and the image key point.
In the embodiment of the present application, reference may be made to the specific description shown in step S1103 in fig. 11, which is not repeated here, and step S1207 is further performed.
Step S1205, if the image frame type is the predicted frame type, obtaining the coded data in the coded code stream, and decoding the coded data to obtain a decoded image i corresponding to the image to be coded.
In the embodiment of the present application, reference may be made to step S1102 in fig. 11, and the decoding process under each coding mode is not described herein. Optionally, if the reference frame is always the first image frame of the video stream to be encoded during encoding, step S1207 is performed; if the reference frame is changed during encoding, step S1206 is performed.
In step S1206, the decoded image i is determined as a reference frame.
In the embodiment of the present application, the decoded image i is determined as the reference frame, and step S1207 is further performed.
In step S1207, the decoding completion state is detected.
In the embodiment of the present application, N encoded code streams, that is, the decoding completion status of the video code streams are detected, and if the decoding is completed, step S1209 is executed; if the decoding is not completed, step S1208 is executed.
Step S1208, i++.
In the embodiment of the present application, i++, the process returns to step S1201, where the next encoded code stream is decoded.
Step S1209, a decoded video is obtained.
In the embodiment of the application, through the above process, the decoded images corresponding to the N code streams respectively can be obtained, and the N decoded images are combined into the decoded video, where the decoded video is the video obtained by encoding the video stream to be encoded and then decoding.
Through the above process, the method can correspond to the above coding process, because the image frame type in the video stream to be coded is the predicted frame type and the image frame of the first frame type is less, and the image frame type of the image frame with more number is the conventional frame type, the coding code stream of the image frame with the conventional frame type only comprises the code stream information, and the occupied order of magnitude is small, so that the order of magnitude of the coding code stream obtained by the decoding device is small, the data transmission efficiency of the image coding and decoding can be improved, and the storage space is saved. In addition, for the image frames which are likely to be abnormal, such as the occurrence of picture shielding or severe change (namely, the image frames with the image frame type being the predicted frame type), the image frames are directly encoded, so that the encoding process has little loss of information, and the number of the image frames is small and the data amount is not increased too much, thereby improving the efficiency of image processing and the accuracy of image processing.
Compared with the traditional encoder, the application can realize the code rate saving of 60-90% under the same image quality; compared with the traditional encoder, the method can provide clearer and more stable image quality experience under the same code stream. The application can be applied to single person video call scenes, can be used for improving the problem of blocking caused by weak network environment, and can also be used in all network environments to achieve the effect of saving bandwidth.
Further, referring to fig. 13, fig. 13 is a schematic diagram of an image encoding apparatus according to an embodiment of the application. The image encoding means may be a computer program (including program code etc.) running in a computer device, for example the image encoding means may be an application software; the device can be used for executing corresponding steps in the method provided by the embodiment of the application. As shown in fig. 13, the image encoding apparatus 1300 may be used in the computer device in the embodiment corresponding to fig. 3, and specifically, the apparatus may include: the device comprises a type detection module 11, a key point identification module 12, an image coding module 13 and a code stream encapsulation module 14.
A type detection module 11 for detecting an image frame type of an image to be encoded;
The key point identifying module 12 is configured to identify an image key point of the image to be encoded if the image frame type is a key frame type;
the image coding module 13 is used for coding the image to be coded to generate coded data corresponding to the image to be coded;
the code stream packaging module 14 is configured to package the image key point, the image frame type of the image to be encoded, and the encoded data into an encoded code stream of the image to be encoded; the key frame type is used for representing that the image to be decoded is the first image frame of the video stream to be encoded, or the image frame type which cannot be generated based on the image key points;
the code stream encapsulation module 14 is further configured to identify an image key point of the image to be encoded if the image frame type is a normal frame type, and encapsulate the image key point and the image frame type into an encoded code stream of the image to be encoded.
The video stream to be encoded comprises N image frames, and the key frame type comprises a first frame type and a predicted frame type; n is a positive integer;
this type detection module 11 includes:
a first frame determining unit 111, configured to determine that the image frame type of the image to be encoded is a first frame type if the image to be encoded is a first image frame in the video stream to be encoded;
A reference acquiring unit 112, configured to acquire a reference frame of the N image frames if the image to be encoded is not the first image frame in the video stream to be encoded;
a key point detection unit 113, configured to obtain a reference key point in a reference frame and an image key point in an image to be encoded;
an image reconstruction unit 114 for generating a reconstructed image based on the image keypoints and the reference keypoints; the reference frame is the first image frame or one of N image frames, and the image frame type is the image frame positioned before the image to be coded in the predicted frame type;
a prediction determining unit 115, configured to determine that an image frame type of the image to be encoded is a predicted frame type if an image difference between the reconstructed image and the image to be encoded is greater than or equal to an image anomaly threshold;
the conventional determining unit 116 is configured to determine that the image frame type of the image to be encoded is a conventional frame type if the image difference between the reconstructed image and the image to be encoded is smaller than the image anomaly threshold.
Wherein the image reconstruction unit 114 comprises:
an optical flow determining sub-unit 1141, configured to determine optical flow information of the image to be encoded relative to the reference frame based on the image key point and the reference key point;
The image reconstruction subunit 1142 is configured to update the reference frame based on the optical flow information, and obtain a reconstructed image corresponding to the image to be encoded.
The image reconstruction subunit 1142 is specifically configured to:
carrying out convolution processing on the reference frame to obtain image convolution characteristics, and carrying out characteristic conversion on the image convolution characteristics by adopting optical flow information to obtain first image characteristics;
residual processing is carried out on the first image feature to obtain a second image feature;
and carrying out convolution processing on the second image characteristics to generate a reconstructed image corresponding to the image to be encoded.
The keypoint detection unit 113 is specifically configured to:
respectively inputting the reference frame and the image to be encoded into a key point analysis network to analyze key points, and determining reference key points in the reference frame and image key points in the image to be encoded;
the image reconstruction unit 114 includes:
the optical flow analysis subunit 1143 is configured to input the reference key point and the image key point into an optical flow analysis model to perform optical flow analysis, so as to obtain optical flow information of the image to be encoded relative to the reference frame;
the network generating sub-unit 1144 is configured to input the reference frame and the optical flow information into the image generating network to generate a reconstructed image.
Wherein the apparatus 1300 further comprises:
the sample analysis module 15 is configured to input a first image sample and a second image sample into the initial keypoint analysis network respectively to perform keypoint analysis, and determine a first sample keypoint in the first image sample and a second sample keypoint in the second image sample;
the sample analysis model 15 is further configured to input the first sample key point and the second sample key point into an initial optical flow analysis model for optical flow analysis, so as to obtain sample optical flow information of the second image sample relative to the first image sample;
a sample reconstruction module 16, configured to input the first image sample and sample optical flow information into an initial image generation network, and generate a first sample reconstructed image;
a loss construction module 17 for reconstructing an image from the first image samples and constructing a model loss from the second image samples;
the model training module 18 is configured to perform parameter adjustment on the initial key point analysis network, the initial optical flow analysis model, and the initial image generation network by using model loss until the parameters converge, so as to obtain the key point analysis network corresponding to the initial key point analysis network, the optical flow analysis model corresponding to the initial optical flow analysis model, and the image generation network corresponding to the initial image generation network.
Wherein model loss includes generation loss; the loss construction module 17 includes:
a quality determining unit 171, configured to input the first sample reconstructed image into the image identifier for image detection, and determine an output image quality corresponding to the first sample reconstructed image;
a loss generation unit 172 for generating the generation loss based on the output image quality;
the apparatus 1300 further comprises:
the image discriminating module 19 is configured to obtain a second sample reconstructed image corresponding to the third image sample, input the second sample reconstructed image and the third image sample into the initial image discriminator respectively for image detection, and determine a first sample quality corresponding to the third image sample and a second sample quality corresponding to the second sample reconstructed image;
the discriminant training module 20 is configured to generate an image discriminant loss according to the first sample quality and the second sample quality, and perform parameter adjustment on the initial image discriminant by using the image discriminant loss until the image discriminant is obtained.
Wherein the apparatus 1300 further comprises:
the structure analysis module 21 is configured to obtain a first key region in the image to be encoded and a second key region in the reconstructed image, and perform structural similarity analysis on the first key region and the second key region to obtain an image difference between the reconstructed image and the image to be encoded;
The conventional determining unit 116 is specifically configured to:
if the image difference degree between the reconstructed image and the image to be encoded is smaller than the image abnormality threshold, acquiring continuous M image frames comprising the image to be encoded from N image frames; m is a positive integer;
if the image expressions of the M image frames are smaller than the expression threshold value, determining that the image frame type of the image to be encoded is a conventional frame type.
Wherein the apparatus 1300 further comprises:
the object detection module 22 is configured to perform object detection on an image to be encoded, and if a target object is detected in the image to be encoded, perform a process of acquiring a reference key point in a reference frame and an image key point in the image to be encoded;
the prediction determining module 23 is configured to determine that the image frame type of the image to be encoded is a predicted frame type if the target object is not detected in the image to be encoded.
Wherein the image encoding module 13 comprises:
the residual error coding unit 131 is configured to obtain an adjacent image of an image to be coded in a video stream to be coded, obtain a first image residual error between the image to be coded and the adjacent image, code the first image residual error, and generate coded data corresponding to the image to be coded; or alternatively, the process may be performed,
The residual error coding unit 131 is further configured to obtain a reference frame in the video stream to be coded, obtain a second image residual error between the image to be coded and the reference frame, and code the second image residual error to generate coded data corresponding to the image to be coded; or alternatively, the process may be performed,
the image encoding unit 132 is configured to encode an image to be encoded, and generate encoded data corresponding to the image to be encoded.
The embodiment of the application provides an image coding device which can detect the type of an image frame of an image to be coded; if the image frame type is the key frame type, identifying image key points of the image to be encoded, encoding the image to be encoded, generating encoded data corresponding to the image to be encoded, and packaging the image key points, the image frame type of the image to be encoded and the encoded data into an encoded code stream of the image to be encoded; the key frame type is used for representing that the image to be decoded is the first image frame of the video stream to be encoded, or the image frame type which cannot be generated based on the image key points; if the image frame type is the conventional frame type, identifying the image key points of the image to be encoded, and packaging the image key points and the image frame type into an encoding code stream of the image to be encoded. Through the above process, the coding and decoding processing of the image to be coded can be realized based on the image frame type of the image to be coded, wherein in general cases (namely the conventional frame type), only key points of the image are required to be coded, the code rate saving of orders of magnitude can be realized, and less bandwidth occupation is required under the same video quality; on the basis, when the image frame type of the image to be encoded is the key frame type, the image to be encoded can not be directly reconstructed based on the image key points, so that the image to be encoded can be encoded to make up for possible encoding abnormality in the manner of encoding based on the key points, and the number of the images of the key frame type is generally not too large, so that the data amount required to be processed for image encoding is not increased too much, that is, the improvement of the efficiency of image encoding can still be realized, and the accuracy of image encoding can be further improved.
Further, referring to fig. 14, fig. 14 is a schematic diagram of an image decoding apparatus according to an embodiment of the application. The image encoding and decoding means may be a computer program (including program code, etc.) running in a computer device, for example the image decoding means may be an application software; the device can be used for executing corresponding steps in the method provided by the embodiment of the application. As shown in fig. 14, the image decoding apparatus 1400 may be used in the computer device in the embodiment corresponding to fig. 11, and specifically, the apparatus may include: a type acquisition module 31, a data acquisition module 32, a data decoding module 33, a reference acquisition module 34, a keypoint acquisition module 35, and a reference decoding module 36.
A type obtaining module 31, configured to obtain an image frame type from an encoding code stream corresponding to an image to be encoded;
a data acquisition module 32, configured to acquire encoded data in the encoded code stream if the image frame type is a key frame type;
the data decoding module 33 is configured to perform decoding processing on the encoded data to obtain a decoded image corresponding to the image to be encoded; the coding code stream is obtained by encapsulating the image key points, the image frame types and the coding data of the image to be coded; the key frame type is used for representing that the image to be decoded is the first image frame of the video stream to be encoded, or the image frame type which cannot be generated based on the image key points;
A reference acquisition module 34, configured to acquire a reference frame in the video stream to be encoded if the image frame type is a normal frame type;
a key point obtaining module 35, configured to obtain an image key point in the encoded code stream;
a reference decoding module 36 for generating a decoded image based on the reference frame and the image keypoints; the coding code stream is obtained by packaging based on the image key points and the image frame types of the images to be coded.
Wherein the data decoding module 33 comprises:
the residual decoding unit 331 is configured to perform decoding processing on the encoded data to obtain a first decoded residual, obtain an adjacent decoded image adjacent to the encoded code stream, fuse the first decoded residual on the basis of the adjacent decoded image, and generate a decoded image corresponding to the image to be encoded; or alternatively, the process may be performed,
the residual decoding unit 331 is further configured to perform decoding processing on the encoded data to obtain a second decoded residual, obtain a reference frame, and fuse the second decoded residual on the basis of the reference frame to generate a decoded image corresponding to the image to be encoded; or alternatively, the process may be performed,
the data decoding unit 332 is configured to perform decoding processing on the encoded data, so as to obtain a decoded image corresponding to the image to be encoded.
Wherein the reference decoding module 36 comprises:
An optical flow determining unit 361, configured to obtain a reference key point of a reference frame, and determine optical flow information of an image to be encoded relative to the reference frame according to the reference key point and the image key point;
the optical flow decoding unit 362 is configured to update the reference frame based on the optical flow information, and obtain a decoded image.
Referring to fig. 15, fig. 15 is a schematic structural diagram of a computer device according to an embodiment of the present application. As shown in fig. 15, the computer device in the embodiment of the present application may include: one or more processors 1501, memory 1502, and input-output interfaces 1503. The processor 1501, memory 1502 and input/output interface 1503 are connected via a bus 1504. The memory 1502 is used for storing a computer program including program instructions, the input output interface 1503 is used for receiving data and outputting data, such as for data interaction between a host and a computer device, or for data interaction between virtual machines in a host; the processor 1501 is used to execute program instructions stored in the memory 1502.
Wherein, when the processor 1501 is located in the image encoding apparatus, the following operations may be performed:
detecting an image frame type of an image to be encoded;
If the image frame type is the key frame type, identifying image key points of the image to be encoded, encoding the image to be encoded, generating encoded data corresponding to the image to be encoded, and packaging the image key points, the image frame type of the image to be encoded and the encoded data into an encoded code stream of the image to be encoded; the key frame type is used for representing that the image to be decoded is the first image frame of the video stream to be encoded, or the image frame type which cannot be generated based on the image key points;
if the image frame type is the conventional frame type, identifying the image key points of the image to be encoded, and packaging the image key points and the image frame type into an encoding code stream of the image to be encoded.
Wherein, when the processor 1501 is located in the image decoding apparatus, the following operations may be performed:
acquiring an image frame type from an encoding code stream corresponding to an image to be encoded;
if the image frame type is the key frame type, acquiring encoded data in the encoded code stream, and decoding the encoded data to obtain a decoded image corresponding to the image to be encoded; the coding code stream is obtained by encapsulating the image key points, the image frame types and the coding data of the image to be coded; the key frame type is used for representing that the image to be decoded is the first image frame of the video stream to be encoded, or the image frame type which cannot be generated based on the image key points;
If the image frame type is the conventional frame type, acquiring a reference frame in the video stream to be encoded, acquiring an image key point in the encoded code stream, and generating a decoded image based on the reference frame and the image key point; the coding code stream is obtained by packaging based on the image key points and the image frame types of the images to be coded.
In some possible implementations, the processor 1501 may be a central processing unit (central processing unit, CPU), which may also be other general purpose processors, digital signal processors (digital signal processor, DSP), application specific integrated circuits (application specific integrated circuit, ASIC), off-the-shelf programmable gate arrays (field-programmable gate array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 1502 may include read only memory and random access memory and provide instructions and data to the processor 1501 and input output interface 1503. A portion of memory 1502 may also include non-volatile random access memory. For example, the memory 1502 may also store information of device type.
In a specific implementation, the computer device may execute, through each functional module built in the computer device, an implementation manner provided by each step in fig. 3 or fig. 11, and specifically, the implementation manner provided by each step in fig. 3 or fig. 11 may be referred to, which is not described herein again.
An embodiment of the present application provides a computer device, including: the processor, the input/output interface and the memory acquire the computer program in the memory through the processor, execute the steps of the method shown in fig. 3, and perform the image encoding and decoding operation. The embodiment of the application realizes the detection of the image frame type of the image to be encoded; if the image frame type is the key frame type, identifying image key points of the image to be encoded, encoding the image to be encoded, generating encoded data corresponding to the image to be encoded, and packaging the image key points, the image frame type of the image to be encoded and the encoded data into an encoded code stream of the image to be encoded; the key frame type is used for representing that the image to be decoded is the first image frame of the video stream to be encoded, or the image frame type which cannot be generated based on the image key points; if the image frame type is the conventional frame type, identifying the image key points of the image to be encoded, and packaging the image key points and the image frame type into an encoding code stream of the image to be encoded. Through the above process, the coding and decoding processing of the image to be coded can be realized based on the image frame type of the image to be coded, wherein in general cases (namely the conventional frame type), only key points of the image are required to be coded, the code rate saving of orders of magnitude can be realized, and less bandwidth occupation is required under the same video quality; on the basis, when the image frame type of the image to be encoded is the key frame type, the image to be encoded can not be directly reconstructed based on the image key points, so that the image to be encoded can be encoded to make up for possible encoding abnormality in the manner of encoding based on the key points, and the number of the images of the key frame type is generally not too large, so that the data amount required to be processed for image encoding is not increased too much, that is, the improvement of the efficiency of image encoding can still be realized, and the accuracy of image encoding can be further improved.
The embodiment of the present application further provides a computer readable storage medium, where the computer readable storage medium stores a computer program, where the computer program is adapted to be loaded by the processor and execute the image encoding and decoding methods provided by each step in fig. 3 or fig. 11, and specifically refer to the implementation manner provided by each step in fig. 3 or fig. 11, which is not described herein again. In addition, the description of the beneficial effects of the same method is omitted. For technical details not disclosed in the embodiments of the computer-readable storage medium according to the present application, please refer to the description of the method embodiments of the present application. As an example, a computer program may be deployed to be executed on one computer device or on multiple computer devices at one site or distributed across multiple sites and interconnected by a communication network.
The computer readable storage medium may be the image codec apparatus provided in any one of the foregoing embodiments or an internal storage unit of the computer device, for example, a hard disk or a memory of the computer device. The computer readable storage medium may also be an external storage device of the computer device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) card, a flash card (flash card) or the like, which are provided on the computer device. Further, the computer-readable storage medium may also include both internal storage units and external storage devices of the computer device. The computer-readable storage medium is used to store the computer program and other programs and data required by the computer device. The computer-readable storage medium may also be used to temporarily store data that has been output or is to be output.
Embodiments of the present application also provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium, and the processor executes the computer instructions, so that the computer device executes the methods provided in various alternative modes in fig. 3 or 11, and can implement the encoding and decoding processing of the image to be encoded based on the image frame type of the image to be encoded, wherein in general (i.e. conventional frame type), only key points of the image need to be encoded, order-of-magnitude code rate saving can be achieved, and less bandwidth occupation is required under the same video quality; on the basis, when the image frame type of the image to be encoded is the key frame type, the image to be encoded can not be directly reconstructed based on the image key points, so that the image to be encoded can be encoded to make up for possible encoding abnormality in the manner of encoding based on the key points, and the number of the images of the key frame type is generally not too large, so that the data amount required to be processed for image encoding is not increased too much, that is, the improvement of the efficiency of image encoding can still be realized, and the accuracy of image encoding can be further improved.
The terms first, second and the like in the description and in the claims and drawings of embodiments of the application are used for distinguishing between different objects and not for describing a particular sequential order. Furthermore, the term "include" and any variations thereof is intended to cover a non-exclusive inclusion. For example, a process, method, apparatus, article, or device that comprises a list of steps or elements is not limited to the list of steps or modules but may, in the alternative, include other steps or modules not listed or inherent to such process, method, apparatus, article, or device.
Those of ordinary skill in the art will appreciate that the elements and algorithm steps described in connection with the embodiments disclosed herein may be embodied in electronic hardware, in computer software, or in a combination of the two, and that the elements and steps of the examples have been generally described in terms of function in this description to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The method and related apparatus provided in the embodiments of the present application are described with reference to the flowchart and/or schematic structural diagrams of the method provided in the embodiments of the present application, and each flow and/or block of the flowchart and/or schematic structural diagrams of the method may be implemented by computer program instructions, and combinations of flows and/or blocks in the flowchart and/or block diagrams. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable image codec device to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable image codec device, create means for implementing the functions specified in the flowchart flow or flows and/or structural diagram block or blocks. These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable image codec device to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or structural diagram block or blocks. These computer program instructions may also be loaded onto a computer or other programmable image codec device to cause a series of operational steps to be performed on the computer or other programmable device to produce a computer implemented process such that the instructions which execute on the computer or other programmable device provide steps for implementing the functions specified in the flowchart flow or flows and/or structures block or blocks.
The steps in the method of the embodiment of the application can be sequentially adjusted, combined and deleted according to actual needs.
The modules in the device of the embodiment of the application can be combined, divided and deleted according to actual needs.
The foregoing disclosure is illustrative of the present application and is not to be construed as limiting the scope of the application, which is defined by the appended claims.

Claims (18)

1. An image encoding method, the method comprising:
detecting an image frame type of an image to be encoded;
if the image frame type is a key frame type, identifying an image key point of the image to be encoded, encoding the image to be encoded, generating encoded data corresponding to the image to be encoded, and packaging the image key point, the image frame type of the image to be encoded and the encoded data into an encoded code stream of the image to be encoded; the key frame type is used for indicating that the image to be decoded is the first image frame of a video stream to be encoded or the image frame type which cannot be generated based on the image key points;
and if the image frame type is a conventional frame type, identifying the image key points of the image to be encoded, and packaging the image key points and the image frame type into an encoding code stream of the image to be encoded.
2. The method of claim 1, wherein the video stream to be encoded comprises N image frames, the key frame type comprising a first frame type and a predicted frame type; n is a positive integer;
the detecting the image frame type of the image to be encoded comprises:
if the image to be encoded is the first image frame in the video stream to be encoded, determining that the image frame type of the image to be encoded is the first frame type;
if the image to be encoded is not the first image frame in the video stream to be encoded, acquiring reference frames in the N image frames, acquiring reference key points in the reference frames and image key points in the image to be encoded, and generating a reconstructed image based on the image key points and the reference key points; the reference frame is the first image frame or one of the N image frames, and the image frame type is the image frame located before the image to be coded in the prediction frame type;
if the image difference degree between the reconstructed image and the image to be encoded is larger than or equal to an image abnormality threshold value, determining that the image frame type of the image to be encoded is the predicted frame type;
and if the image difference degree between the reconstructed image and the image to be encoded is smaller than the image abnormality threshold, determining that the image frame type of the image to be encoded is the conventional frame type.
3. The method of claim 2, wherein the generating a reconstructed image based on the image keypoints and the reference keypoints comprises:
determining optical flow information of the image to be coded relative to the reference frame based on the image key points and the reference key points;
updating the reference frame based on the optical flow information to obtain a reconstructed image corresponding to the image to be encoded.
4. The method of claim 3, wherein updating the reference frame based on the optical flow information to obtain the reconstructed image corresponding to the image to be encoded comprises:
performing convolution processing on the reference frame to obtain an image convolution feature, and performing feature conversion on the image convolution feature by adopting the optical flow information to obtain a first image feature;
residual processing is carried out on the first image feature to obtain a second image feature;
and carrying out convolution processing on the second image characteristic to generate a reconstructed image corresponding to the image to be encoded.
5. The method of claim 2, wherein the obtaining the reference keypoints in the reference frame and the image keypoints in the image to be encoded, generating a reconstructed image based on the image keypoints and the reference keypoints, comprises:
Inputting the reference frame and the image to be encoded into a key point analysis network respectively to analyze key points, and determining reference key points in the reference frame and image key points in the image to be encoded;
inputting the reference key points and the image key points into an optical flow analysis model to perform optical flow analysis, so as to obtain optical flow information of the image to be coded relative to the reference frame;
inputting the reference frame and the optical flow information into an image generation network to generate a reconstructed image.
6. The method of claim 5, wherein the method further comprises:
respectively inputting a first image sample and a second image sample into an initial keypoint analysis network to analyze keypoints, and determining a first sample keypoint in the first image sample and a second sample keypoint in the second image sample;
inputting the first sample key points and the second sample key points into an initial optical flow analysis model to perform optical flow analysis, so as to obtain sample optical flow information of the second image sample relative to the first image sample;
inputting the first image sample and the sample optical flow information into an initial image generation network to generate a first sample reconstruction image;
And constructing model loss according to the first image sample reconstruction image and the second image sample, and adopting the model loss to carry out parameter adjustment on the initial key point analysis network, the initial optical flow analysis model and the initial image generation network until parameters are converged to obtain a key point analysis network corresponding to the initial key point analysis network, an optical flow analysis model corresponding to the initial optical flow analysis model and an image generation network corresponding to the initial image generation network.
7. The method of claim 6, wherein the model penalty comprises a generation penalty; the reconstructing an image from the first image sample and the second image sample constructing model loss comprises:
inputting the first sample reconstructed image into an image discriminator for image detection, and determining the output image quality corresponding to the first sample reconstructed image;
generating the generation loss based on the output image quality;
the method further comprises the steps of:
acquiring a second sample reconstruction image corresponding to a third image sample, respectively inputting the second sample reconstruction image and the third image sample into an initial image discriminator for image detection, and determining a first sample quality corresponding to the third image sample and a second sample quality corresponding to the second sample reconstruction image;
Generating an image discrimination loss according to the first sample quality and the second sample quality, and carrying out parameter adjustment on the initial image discriminator by adopting the image discrimination loss until the image discriminator is obtained.
8. The method of claim 2, wherein the method further comprises:
acquiring a first key region in the image to be encoded and a second key region in the reconstructed image, and carrying out structural similarity analysis on the first key region and the second key region to obtain the image difference degree between the reconstructed image and the image to be encoded;
and if the image difference degree between the reconstructed image and the image to be encoded is smaller than the image anomaly threshold value, determining that the image frame type of the image to be encoded is the conventional frame type, including:
if the image difference degree between the reconstructed image and the image to be encoded is smaller than the image abnormality threshold, acquiring continuous M image frames comprising the image to be encoded from the N image frames; m is a positive integer;
and if the image frame with the image performance larger than or equal to the performance threshold exists in the M image frames, determining that the image frame type of the image to be encoded is the conventional frame type.
9. The method of claim 2, wherein the method further comprises:
performing object detection on the image to be encoded, and if a target object is detected in the image to be encoded, executing the process of acquiring the reference key point in the reference frame and the image key point in the image to be encoded;
and if the target object is not detected in the image to be encoded, determining that the image frame type of the image to be encoded is the predicted frame type.
10. The method of claim 1, wherein the encoding the image to be encoded to generate encoded data corresponding to the image to be encoded comprises:
acquiring adjacent images of the image to be encoded in the video stream to be encoded, acquiring a first image residual error between the image to be encoded and the adjacent images, encoding the first image residual error, and generating encoded data corresponding to the image to be encoded; or alternatively, the process may be performed,
acquiring a reference frame in the video stream to be encoded, acquiring a second image residual error between the image to be encoded and the reference frame, encoding the second image residual error, and generating encoded data corresponding to the image to be encoded; or alternatively, the process may be performed,
And encoding the image to be encoded to generate encoded data corresponding to the image to be encoded.
11. An image decoding method, the method comprising:
acquiring an image frame type from an encoding code stream corresponding to an image to be encoded;
if the image frame type is a key frame type, acquiring encoded data in the encoded code stream, and decoding the encoded data to obtain a decoded image corresponding to the image to be encoded; the coding code stream is obtained by packaging the image key points of the image to be coded, the image frame type and the coding data; the key frame type is used for indicating that the image to be decoded is the first image frame of a video stream to be encoded or the image frame type which cannot be generated based on the image key points;
if the image frame type is a conventional frame type, acquiring a reference frame in the video stream to be encoded, acquiring an image key point in the encoded code stream, and generating the decoded image based on the reference frame and the image key point; the coding code stream is obtained by packaging based on the image key points of the image to be coded and the image frame type.
12. The method of claim 11, wherein decoding the encoded data to obtain a decoded image corresponding to the image to be encoded comprises:
decoding the encoded data to obtain a first decoding residual error, obtaining adjacent decoding images adjacent to the encoded code stream, fusing the first decoding residual error on the basis of the adjacent decoding images, and generating a decoding image corresponding to the image to be encoded; or alternatively, the process may be performed,
decoding the encoded data to obtain a second decoding residual error, obtaining the reference frame, fusing the second decoding residual error on the basis of the reference frame, and generating a decoding image corresponding to the image to be encoded; or alternatively, the process may be performed,
and decoding the encoded data to obtain a decoded image corresponding to the image to be encoded.
13. The method of claim 11, wherein the generating the decoded image based on the reference frame and the image keypoints comprises:
acquiring a reference key point of the reference frame, and determining optical flow information of the image to be coded relative to the reference frame according to the reference key point and the image key point;
Updating the reference frame based on the optical flow information to obtain the decoded image.
14. An image encoding apparatus, the apparatus comprising:
the type detection module is used for detecting the type of the image frame of the image to be encoded;
the key point identification module is used for identifying the image key points of the image to be encoded if the image frame type is the key frame type;
the image coding module is used for coding the image to be coded and generating coded data corresponding to the image to be coded;
the code stream packaging module is used for packaging the image key points, the image frame types of the images to be coded and the coded data into a coded code stream of the images to be coded; the key frame type is used for indicating that the image to be decoded is the first image frame of a video stream to be encoded or the image frame type which cannot be generated based on the image key points;
and the code stream encapsulation module is further used for identifying the image key points of the image to be encoded if the image frame type is a conventional frame type, and encapsulating the image key points and the image frame type into the encoded code stream of the image to be encoded.
15. An image decoding apparatus, characterized in that the apparatus comprises:
the type acquisition module is used for acquiring an image frame type from an encoding code stream corresponding to an image to be encoded;
the data acquisition module is used for acquiring the coded data in the coded code stream if the image frame type is a key frame type;
the data decoding module is used for decoding the encoded data to obtain a decoded image corresponding to the image to be encoded; the coding code stream is obtained by packaging the image key points of the image to be coded, the image frame type and the coding data; the key frame type is used for indicating that the image to be decoded is the first image frame of a video stream to be encoded or the image frame type which cannot be generated based on the image key points;
the reference acquisition module is used for acquiring a reference frame in the video stream to be encoded if the image frame type is a conventional frame type;
the key point acquisition module is used for acquiring image key points in the coded code stream;
a reference decoding module for generating the decoded image based on the reference frame and the image keypoints; the coding code stream is obtained by packaging based on the image key points of the image to be coded and the image frame type.
16. A computer device, comprising a processor, a memory, and an input-output interface;
the processor is connected to the memory and the input-output interface, respectively, wherein the input-output interface is used for receiving data and outputting data, the memory is used for storing a computer program, and the processor is used for calling the computer program to enable the computer device to execute the method of any one of claims 1-10 or execute the method of any one of claims 11-13.
17. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program adapted to be loaded and executed by a processor to cause a computer device having the processor to perform the method of any one of claims 1-10 or to perform the method of any one of claims 11-13.
18. A computer program product comprising computer programs/instructions which, when executed by a processor, implement the method of any one of claims 1-10 or perform the method of any one of claims 11-13.
CN202310582517.5A 2023-05-22 2023-05-22 Image encoding/decoding method, device, computer, storage medium, and program product Pending CN116980589A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310582517.5A CN116980589A (en) 2023-05-22 2023-05-22 Image encoding/decoding method, device, computer, storage medium, and program product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310582517.5A CN116980589A (en) 2023-05-22 2023-05-22 Image encoding/decoding method, device, computer, storage medium, and program product

Publications (1)

Publication Number Publication Date
CN116980589A true CN116980589A (en) 2023-10-31

Family

ID=88475616

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310582517.5A Pending CN116980589A (en) 2023-05-22 2023-05-22 Image encoding/decoding method, device, computer, storage medium, and program product

Country Status (1)

Country Link
CN (1) CN116980589A (en)

Similar Documents

Publication Publication Date Title
WO2023093186A1 (en) Neural radiation field-based method and apparatus for constructing pedestrian re-identification three-dimensional data set
CN111738280A (en) Image identification method, device, equipment and readable storage medium
CN110414335A (en) Video frequency identifying method, device and computer readable storage medium
CN116205962B (en) Monocular depth estimation method and system based on complete context information
CN111460876A (en) Method and apparatus for identifying video
CN113379858A (en) Image compression method and device based on deep learning
CN115577768A (en) Semi-supervised model training method and device
CN115424318A (en) Image identification method and device
CN115131634A (en) Image recognition method, device, equipment, storage medium and computer program product
CN114936377A (en) Model training and identity anonymization method, device, equipment and storage medium
CN112991274B (en) Crowd counting method and device, computer equipment and storage medium
CN111726621B (en) Video conversion method and device
CN116600119B (en) Video encoding method, video decoding method, video encoding device, video decoding device, computer equipment and storage medium
CN116797975A (en) Video segmentation method, device, computer equipment and storage medium
CN113516592A (en) Image processing method, model training method, device and equipment
CN116980589A (en) Image encoding/decoding method, device, computer, storage medium, and program product
CN113591838B (en) Target detection method, device, electronic equipment and storage medium
CN113111822B (en) Video processing method and device for congestion identification and electronic equipment
CN113808157B (en) Image processing method and device and computer equipment
CN115661276A (en) Image data encoding method, device, apparatus, medium, and program
CN114677611A (en) Data identification method, storage medium and device
CN116883770A (en) Training method and device of depth estimation model, electronic equipment and storage medium
CN115965839A (en) Image recognition method, storage medium, and apparatus
CN116740540B (en) Data processing method, device, equipment and computer readable storage medium
CN116828184B (en) Video encoding method, video decoding method, video encoding device, video decoding device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40098413

Country of ref document: HK