WO2024012574A1 - 图像编码方法、解码方法、装置、可读介质及电子设备 - Google Patents

图像编码方法、解码方法、装置、可读介质及电子设备 Download PDF

Info

Publication number
WO2024012574A1
WO2024012574A1 PCT/CN2023/107504 CN2023107504W WO2024012574A1 WO 2024012574 A1 WO2024012574 A1 WO 2024012574A1 CN 2023107504 W CN2023107504 W CN 2023107504W WO 2024012574 A1 WO2024012574 A1 WO 2024012574A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
blocks
block
encoding
important area
Prior art date
Application number
PCT/CN2023/107504
Other languages
English (en)
French (fr)
Other versions
WO2024012574A9 (zh
Inventor
韩韬
张园
杨明川
王翰铭
王泽琨
Original Assignee
中国电信股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国电信股份有限公司 filed Critical 中国电信股份有限公司
Publication of WO2024012574A1 publication Critical patent/WO2024012574A1/zh
Publication of WO2024012574A9 publication Critical patent/WO2024012574A9/zh

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/65Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using error resilience
    • H04N19/66Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using error resilience involving data partitioning, i.e. separation of data into packets or partitions according to importance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/269Analysis of motion using gradient-based methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/44Decoders specially adapted therefor, e.g. video decoders which are asymmetric with respect to the encoder
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/46Embedding additional information in the video signal during the compression process

Definitions

  • the present disclosure belongs to the field of artificial intelligence technology, and specifically relates to an image encoding method, a decoding method, a device, a readable medium and an electronic device.
  • an image coding method includes: acquiring an original image and performing block processing to obtain multiple image blocks; calculating the gradient value of the pixels in each image block, Screen important area blocks from the multiple image blocks according to the gradient value of the pixel; input the important area blocks and the position information of the important area blocks in the original image into the visual conversion model for encoding, to generate a bitstream.
  • an image coding device includes: an acquisition module for acquiring an original image and performing block processing to obtain multiple image blocks; and a calculation module for calculating The gradient value of the pixel in each image block is used to select important area blocks from the plurality of image blocks according to the gradient value of the pixel; an encoding module is used to combine the important area block and the important area block in the The position information in the original image is input into the visual transformation model for encoding to generate a bitstream.
  • the calculation module is further configured to calculate the gradient value of the pixels in each image block, and calculate the gradient average of each image block according to the gradient value of the pixel; according to the gradient average
  • the plurality of image blocks are sorted, and the image block whose gradient average value is not less than a preset value among the plurality of image blocks is determined as the important area block.
  • the encoding module is further configured to input the important area block into the visual transformation model and output an encoded visible patch and a mask token; according to the encoded visible patch, the mask token code token and the position information of the important area block in the original image, generate an image token, and generate the bit stream according to the image token.
  • the acquisition module is further configured to acquire n ⁇ n original images, where n is a positive integer; and evenly divide the n ⁇ n original images into m blocks according to non-overlapping areas. ⁇ m, the size of each image block is obtained as Among them, m is a positive integer, n>m.
  • the calculation module is further configured to discard image blocks whose gradient average among the plurality of image blocks is less than a preset value; wherein the preset value is set such that the discarded image blocks The number and the preset compression ratio ⁇ of the image satisfy the formula: 100%, where p is the number of discarded image blocks.
  • an image decoding method is provided to decode the encoding performed by the image encoding method as described above.
  • the image decoding method includes: receiving a bit stream generated by encoding; The stream is decoded, and the decoded results are processed by normalization, multi-head attention mechanism and multi-layer perceptron, and the reconstructed image is output.
  • an image decoding device includes: a receiving module for receiving a bit stream generated by encoding; a decoding module for decoding the bit stream, and
  • the decoding results are normalized and multi-head attention machine control and multi-layer perceptron processing, and output the reconstructed image.
  • a computer-readable medium on which a computer program is stored.
  • the computer program is executed by a processor, the image encoding method or the image decoding method in the above technical solution is implemented.
  • an electronic device comprising: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to execute the The executable instructions are used to execute the image encoding method or the image decoding method in the above technical solution.
  • a computer program product or computer program including computer instructions stored in a computer-readable storage medium.
  • the processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device performs the image encoding method or the image decoding method in the above technical solution.
  • Figure 1 schematically shows an exemplary image encoding and decoding system architecture block diagram.
  • Figure 2 schematically shows an exemplary system architecture block diagram applying the technical solution of the present disclosure.
  • Figure 3 schematically shows the step flow of an image encoding method provided by an embodiment of the present disclosure.
  • Figure 4 schematically shows a picture block diagram applying the technical solution of the present disclosure.
  • Figure 5 schematically shows a schematic diagram of the gradient average of each block applying the technical solution of the present disclosure.
  • Figure 6 schematically shows a schematic diagram of an encoder module applying the technical solution of the present disclosure.
  • Figure 7 schematically shows a schematic diagram of a decoder module applying the technical solution of the present disclosure.
  • Figure 8 schematically shows a schematic diagram of the compilation and decoding process using the technical solution of the present disclosure.
  • Figure 9 schematically shows a structural block diagram of an image encoding device provided by an embodiment of the present disclosure.
  • FIG. 10 schematically shows a structural block diagram of a computer system suitable for implementing an electronic device according to an embodiment of the present disclosure.
  • Example embodiments will now be described more fully with reference to the accompanying drawings.
  • Example embodiments may, however, be embodied in various forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concepts of the example embodiments. To those skilled in the art.
  • the present disclosure provides an image encoding method, decoding method, device, readable medium and electronic equipment.
  • the original image is first divided into blocks, Using a combination of block-region gradient calculation and visual transformation model, this method can selectively and controllably compress different information areas of the compressed image, retaining as much information-dense key areas in the image/video as possible. , compress as little as possible; and for non-key areas with sparse information in the image, compress as much as possible to improve image compression efficiency and achieve flexible code rate control under a unified scheme, thereby improving image compression efficiency to a certain extent.
  • Figure 1 schematically shows an exemplary image encoding and decoding system architecture block diagram.
  • the system includes a data collection module 101, an encoder module 102, and a decoder module 103.
  • the data collection module 101 is used to collect images/videos 1001 and transmit them to the encoder module 102; convolution can be used in the encoder module 102.
  • the neural network encodes the image/video 1001 into a bit stream 1002, and transmits the bit stream 1002 to the decoder module 103 at the other end; in the decoder module 103, a convolutional neural network can also be used to reconstruct the bit stream into an image/video.
  • Video 1003; then the reconstructed image/video is used 1003 as the input of the human vision task 104; finally, after the human vision task calculation, the result 1004 is obtained.
  • the encoding of images/videos by the encoder and decoder uniformly encodes all areas of the entire picture, and cannot distinguish between key areas and non-key areas of the image itself; all areas of the entire picture are encoded.
  • the compression of key areas of the image is relatively large, and important information of the image will be lost; the current method cannot selectively compress each area of the image. That is, the current image encoding/decoding method cannot discard non-important area image blocks in the image during image encoding. Therefore, the compression ratio control of these current encoders based on the structure of deep convolutional neural networks is inflexible.
  • the current encoding/decoding system method is oriented to human vision tasks. When this system is oriented to machine vision tasks, it cannot complete machine vision intelligent analysis tasks well.
  • the present disclosure redesigns the encoder and decoder modules in the encoding system oriented to machine vision intelligent analysis tasks.
  • a transformer image encoder based on regional gradient information is proposed.
  • a decoder based on Transformer module (Block) is proposed.
  • Figure 2 schematically shows an exemplary system architecture block diagram applying the technical solution of the present disclosure.
  • the system architecture includes a data collection module for collecting (S201) images/videos to obtain the original image 2001.
  • the original image is input to the encoder module 2002, and the original image is sequentially subjected to block processing (S202), gradient calculation ( S203), important area calculation (S204) and visual converter (vision Transformer) to encode (S205) and output a bit stream
  • block processing S202
  • gradient calculation S203
  • important area calculation S204
  • visual converter vision Transformer
  • the encoder designed in this disclosure is a method that combines regional gradient calculation and converter modules.
  • the design of the image decoder is For converter modules only.
  • the image is divided into blocks, and then the gradient value is calculated for the image pixels in each block area, and the average value of the gradient calculation value of each area is calculated.
  • the average value of the gradient calculation value all The image blocks are sorted and lower-ordered blocks are discarded.
  • the top-ordered image blocks are input to the subsequent converter module, and pictures in other image block areas are directly discarded. By controlling the proportion of discarded pictures, the compression rate can be flexibly controlled.
  • Figure 3 schematically shows the step flow of an image encoding method provided by an embodiment of the present disclosure.
  • the image encoding method may be executed by the controller and may mainly include the following steps S301 to S303.
  • Step S301 Obtain the original image and perform block processing to obtain multiple image blocks.
  • the image/video can be acquired through the data acquisition module to obtain the original image, and then the original image can be processed into blocks.
  • the original image size is n ⁇ n.
  • the n ⁇ n image is evenly divided into m ⁇ m image blocks according to non-overlapping areas.
  • the size of each image block is Referring to Figure 4, Figure 4 schematically shows a schematic diagram of picture blocking using the technical solution of the present disclosure. Taking the 28*28 original image 4002 as an example, divide it evenly into 4*4 image blocks according to non-overlapping areas (S402), and obtain the blocking result 4004, in which the size of each image block is 7*7. In this way, by dividing the original image into blocks, it can be helpful to subsequently determine important area blocks.
  • Step S302 Calculate the gradient value of the pixels in each image block, and select important area blocks from multiple image blocks based on the gradient values of the pixels.
  • the gradient value of the pixels in each block it is helpful to filter important area blocks based on the gradient value of the pixels.
  • selective and controllable compression can be performed on different information areas of the compressed image.
  • the key areas with dense information in the image/video can be retained as much as possible and compressed as little as possible; while the non-key areas with sparse information in the image can be retained as much as possible. area, compress as much as possible to improve image compression efficiency and achieve flexible bit rate control under a unified solution.
  • Step S303 Input the important area blocks and the position information of the important area blocks in the original image into the visual conversion model for encoding to generate a bit stream.
  • the original image is first divided into blocks, and a combination of block area gradient calculation and visual transformation model are used.
  • this method can selectively and scalably perform different information areas on the compressed image. Controlled compression, retain as much information as possible in key areas with dense information in the image/video, and compress as little as possible; while compressing non-key areas with sparse information in the image as much as possible to improve image compression efficiency and achieve a unified solution Flexible rate control.
  • calculating the gradient value of the pixels in each image block, and filtering important area blocks from multiple image blocks according to the gradient values of the pixels may include: calculating the gradient value of the pixels in each image block, Calculate the gradient average value of each image block according to the gradient value of the pixel; sort multiple image blocks according to the gradient average value, and determine the image block whose gradient average value is not less than the preset value among the multiple image blocks as an important area block.
  • g(x,y) is the gradient calculation value of (x,y). Then the gradient calculation values of all pixels in the block area are calculated according to the following formula (4):
  • d(i,j) is the average gradient of all pixels in each block.
  • the values of i and j both range from 0 to m-1.
  • Figure 5 schematically shows a schematic diagram of the gradient average value of each image block applying the technical solution of the present disclosure.
  • the original image 5002 of size n ⁇ n is evenly divided into blocks (S502) according to the non-overlapping areas into m ⁇ m 504, and the size of each image block is obtained as Then all image blocks are sorted from large to small according to the value of d(i,j), that is, ⁇ d(2,2),d(1,2),d(2,1),d(1 ,1),... ⁇ Sorted in order.
  • the p image blocks with smaller d(i,j) values at the bottom of the sorting are discarded, and the number of remaining image blocks is n ⁇ np. These remaining image blocks are regarded as important area blocks.
  • this method performs selective and controllable compression on different information areas of the compressed image, retaining as much information as possible and compressing as little as possible the key areas with dense information in the image/video; while the non-information areas in the image/video with sparse information are compressed as little as possible. Key areas are compressed as much as possible to improve image compression efficiency and achieve flexible bit rate control under a unified solution.
  • inputting the important area blocks and the position information of the important area blocks in the original image into the visual conversion model for encoding to generate a bit stream includes: inputting the important area blocks into the visual conversion model and outputting Encode visible patches and mask tokens; generate image tokens based on the position information of encoded visible patches, mask tokens, and important area blocks in the original image, and generate a bit stream based on the image tokens.
  • FIG 6 schematically shows a schematic diagram of an encoder module applying the technical solution of the present disclosure.
  • the same number of p patch information and their position information as the input will be obtained.
  • the same number of d ⁇ d patches as the original image size will be obtained by rearranging them based on the position information.
  • the video coding system methods of related technical solutions are all oriented to human vision tasks, and when oriented to machine vision tasks, they cannot complete machine vision intelligent analysis tasks well.
  • the technical proposal of this embodiment is oriented to machine vision tasks and can better complete machine vision intelligent analysis tasks.
  • the preset value may be set such that the number of discarded image blocks and the preset compression ratio ⁇ of the image satisfy formula (5):
  • p is the number of discarded image blocks.
  • the compression rate can be flexibly controlled by controlling the rate of discarding image blocks after the image is divided into blocks.
  • an image decoding method is provided to decode the encoding performed by the image encoding method as described above.
  • the image decoding method includes: receiving a bit stream generated by encoding; decoding the bit stream, The decoding results are processed by normalization, multi-head attention mechanism and multi-layer perceptron, and the reconstructed image is output.
  • FIG 7 schematically shows a schematic diagram of a decoder module applying the technical solution of the present disclosure.
  • the converter module 7004 consists of normalization layers 70042 and 70046, multi-head self-attention It is composed of Multi-head Self Attention layer 70044 and Multi-Layer Perceptron (MLP, also known as Multi-layer Perceptron) module 70048.
  • MLP Multi-Layer Perceptron
  • the weight matrix W t in the multi-head self-attention layer 70044 and the attention weight matrix of each head are randomly initialized.
  • each image block vector t corresponds to the attention of each head, and is calculated as shown in the following formulas (7) and (8):
  • h t is the image block vector t corresponding to the head of each attention, is the dimension of matrix K t , ⁇ (Q t ,K t ,V t ) is the function to calculate attention, is the Softmax logistic regression function.
  • FIG. 8 schematically shows a schematic diagram of the compiling and decoding process using the technical solution of the present disclosure.
  • step S801 block calculation is performed on the original image (or video) 8002, and the n ⁇ n image is evenly divided into m ⁇ m blocks according to non-overlapping areas.
  • Step S802 For each pixel (x, y) in the image block, calculate its x-direction gradient and y-direction gradient using formula (1) and formula (2) respectively.
  • Step S803 use formula (3) to calculate the gradient calculation value of pixel (x, y).
  • Step S804 use formula (4) to calculate the average gradient of all pixels in each image block.
  • Step S805 Sort all image blocks according to the value of d(i, j). The p blocks with smaller d(i,j) values at the bottom of the order are discarded, and the calculation of the compression ratio ⁇ satisfies formula (5).
  • Step S806 Generate image tokens according to the results of the discard operation, including, for example, encoding visible patches and mask tokens.
  • Step S807 Generate bit stream 8004 according to the image token.
  • step S808 the encoded visible patch, mask token and position embedding information are obtained according to the bit stream 8006.
  • Step S809 Normalize the data obtained by position embedding the coded visible patches and mask tokens
  • Step S810 Calculate the multi-head self-attention using equations (6) to (8).
  • Step S811 Normalize the multi-head self-attention calculation results.
  • Step S812 Perform multi-layer perceptron calculation on the normalized result.
  • Step S813 Output the reconstructed picture/video 8008.
  • this disclosure adopts a method based on regional gradient calculation to design an image codec, that is, selective compression of image content information; it proposes to perform gradients, gradient calculation values, and gradient calculation averages on block images. Calculate, and filter out important information chunks based on gradient calculation mean; propose the idea of calculating key areas based on gradient calculation mean information, sort and screen out important areas where information is focused, and then discard non-important areas in the image to achieve image clarity Selected compression; by controlling the ratio of discarding picture blocks after image blocks, the compression rate can be flexibly controlled; in addition, the video coding system methods of related technical solutions are all oriented to human vision tasks, and cannot be easily used when oriented to machine vision tasks. Complete machine vision intelligent analysis tasks well. The system proposed in this disclosure is oriented to machine vision tasks and can better complete machine vision intelligent analysis tasks.
  • an image encoding device 900 may include an acquisition module 901 , a calculation module 902 and an encoding module 903 .
  • the acquisition module 901 can be used to acquire the original image and perform block processing to obtain multiple image blocks;
  • the calculation module 902 can be used to calculate the gradient value of the pixels in each image block, and filter important area blocks from multiple image blocks according to the gradient value of the pixel;
  • the encoding module 903 can be used to input the important area blocks and the position information of the important area blocks in the original image into the visual conversion model for encoding to generate a bit stream.
  • the calculation module 902 can also be used to calculate the gradient value of the pixels in each image block, calculate the gradient average value of each image block according to the gradient value of the pixel, and calculate the gradient value of multiple image blocks according to the gradient average value. Sort the image blocks and determine the image blocks whose gradient average is not less than the preset value among multiple image blocks as important area blocks.
  • the encoding module 903 can also be used to input the important area blocks into the visual transformation model and output the encoded visible patches and mask tokens; according to the encoded visible patches, mask tokens and important area blocks in the original Position information in the image, generate image tokens, and generate a bitstream based on the image tokens.
  • the acquisition module 901 can also be used to acquire n ⁇ n original images, where n is a positive integer; and evenly divide the n ⁇ n original images into blocks according to non-overlapping areas. is m ⁇ m, the size of each image block is obtained as Among them, m is a positive integer, n>m.
  • the calculation module may also be configured to discard image blocks whose gradient average among multiple image blocks is less than a preset value; wherein the preset value is set so that the number of discarded image blocks is consistent with the preset value of the image.
  • the compression ratio ⁇ satisfies the formula:
  • p is the number of discarded image blocks.
  • an image decoding device may include: a receiving module that can be used to receive a bit stream generated by encoding; a decoding module that can be used to decode the bit stream and send the decoding result After normalization, multi-head attention mechanism and multi-layer perceptron processing, the reconstructed image is output.
  • FIG. 10 schematically shows a computer system structural block diagram of an electronic device for implementing an embodiment of the present disclosure.
  • the computer system 1000 includes a central processing unit 1001 (Central Processing Unit, CPU), which can be loaded into a random computer according to a program stored in a read-only memory 1002 (Read-Only Memory, ROM) or from a storage part 1008. Access the program in the memory 1003 (Random Access Memory, RAM) to perform various appropriate actions and processes. In the random access memory 1003, various programs and data required for system operation are also stored.
  • the central processing unit 1001, the read-only memory 1002 and the random access memory 1003 are connected to each other through a bus 1004.
  • the input/output interface 1005 Input/Output interface, ie, I/O interface
  • I/O interface input/output interface
  • the following components are connected to the input/output interface 1005: an input part 1006 including a keyboard, a mouse, etc.; an output part 1007 including a cathode ray tube (Cathode Ray Tube, CRT), a liquid crystal display (Liquid Crystal Display, LCD), etc., and a speaker, etc. ; a storage part 1008 including a hard disk, etc.; and a communication part 1009 including a network interface card such as a LAN card, a modem, etc.
  • the communication section 1009 performs communication processing via a network such as the Internet.
  • Driver 1010 is also connected to input/output interface 1005 as needed.
  • Removable media 1011 such as magnetic disks, optical disks, magneto-optical disks, semiconductor memories, etc., are installed on the drive 1010 as needed, so that a computer program read therefrom is installed into the storage portion 1008 as needed.
  • embodiments of the present disclosure include a computer program product including a computer program carried on a computer-readable medium, the computer program containing program code for performing the method illustrated in the flowchart.
  • the computer program may be downloaded and installed from the network via communication portion 1009 and/or installed from removable media 1011.
  • this computer program is executed by the central processor 1001, various functions defined in the system of the present disclosure are performed.
  • the computer-readable medium shown in the embodiments of the present disclosure may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two.
  • the computer-readable storage medium may be, for example, but is not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or any combination thereof.
  • Computer readable storage media may include, but are not limited to: an electrical connection having one or more wires, a portable computer disk, a hard drive, random access memory (RAM), read only memory (ROM), removable Programmable Read-Only Memory (Erasable Programmable Read Only Memory, EPROM), flash memory, optical fiber, portable compact disk read-only memory (Compact Disc Read-Only Memory, CD-ROM), optical storage device, magnetic storage device, or any of the above suitable The combination.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program for use by or in connection with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, carrying computer-readable program code therein. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the above.
  • a computer-readable signal medium may also be any computer-readable medium other than computer-readable storage media that can send, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
  • Program code embodied on a computer-readable medium may be transmitted using any suitable medium, including but not limited to: wireless, wired, etc., or any suitable combination of the above.
  • each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more logic functions that implement the specified executable instructions.
  • the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown one after another may actually execute substantially in parallel, or they may sometimes execute in the reverse order, depending on the functionality involved.
  • each block in the block diagram or flowchart illustration, and combinations of blocks in the block diagram or flowchart illustration can be implemented by special purpose hardware-based systems that perform the specified functions or operations, or may be implemented by special purpose hardware-based systems that perform the specified functions or operations.
  • the example embodiments described here can be implemented by software, or can be implemented by software combined with necessary hardware. Therefore, the technical solution according to the embodiment of the present disclosure can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, U disk, mobile hard disk, etc.) or on the network , including several instructions to cause a computing device (which may be a personal computer, a server, a touch terminal, a network device, etc.) to execute the method according to the embodiments of the present disclosure.
  • a computing device which may be a personal computer, a server, a touch terminal, a network device, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

本公开属于人工智能技术领域,具体涉及一种图像编码方法、解码方法、装置、可读介质及电子设备。该方法包括获取原始图像,并进行分块处理,获得多个图像块;计算每个图像块内像素的梯度值,根据所述像素的梯度值筛选重要区域块;将所述重要区域块以及所述重要区域块在所述原始图像中的位置信息,输入视觉转换模型中进行编码,以生成比特流。这样,该方法针对被压缩图像的不同信息区域,进行有选择、可控制地压缩,对于图像中信息密集的重点区域尽可能多的保留,尽量少压缩;而对图像中信息稀疏的非重点区域,则尽可能的压缩,提高图像压缩效率,实现统一方案下灵活的码率控制。

Description

图像编码方法、解码方法、装置、可读介质及电子设备
本公开基于申请号为202210837739.2、申请日为2022年7月15日、发明名称为《图像编码方法、解码方法、装置、可读介质及电子设备》的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本公开作为参考。
技术领域
本公开属于人工智能技术领域,具体涉及一种图像编码方法、解码方法、装置、可读介质及电子设备。
背景技术
传统的图像/视频编码面向人类视觉任务,大多用于娱乐用途,注重视频数据信号的保真、高帧率、清晰度等。随着5G、大数据以及人工智能的快速发展,在图像/视频大数据应用背景下,媒体内容如图像和视频等被广泛应用在目标检测、目标跟踪、图像分类、图像分割、行人重识别等智能视觉任务等领域,这些智能视觉任务又被称作面向机器视觉的智能任务。
需要说明的是,在上述背景技术部分公开的信息仅用于加强对本公开的背景的理解,因此可以包括不构成对本领域普通技术人员已知的现有技术的信息。
发明内容
本公开的其他特性和优点将通过下面的详细描述变得显然,或部分地通过本公开的实践而习得。
根据本公开实施例的一个方面,提供一种图像编码方法,所述图像编码方法包括:获取原始图像,并进行分块处理,获得多个图像块;计算每个图像块内像素的梯度值,根据所述像素的梯度值从所述多个图像块中筛选重要区域块;将所述重要区域块以及所述重要区域块在所述原始图像中的位置信息,输入视觉转换模型中进行编码,以生成比特流。
根据本公开实施例的一个方面,提供一种图像编码装置,所述图像编码装置包括:获取模块,用于获取原始图像,并进行分块处理,获得多个图像块;计算模块,用于计算每个图像块内像素的梯度值,根据所述像素的梯度值从所述多个图像块中筛选重要区域块;编码模块,用于将所述重要区域块以及所述重要区域块在所述原始图像中的位置信息,输入视觉转换模型中进行编码,以生成比特流。
在本公开的一些实施例中,所述计算模块还用于,计算每个图像块内像素的梯度值,根据所述像素的梯度值计算各个图像块的梯度平均值;根据所述梯度平均值对所述多个图像块进行排序,并将所述多个图像块中梯度平均值不小于预设值的图像块确定为所述重要区域块。
在本公开的一些实施例中,所述编码模块还用于,将所述重要区域块输入所述视觉转换模型,输出编码可见补丁和掩码令牌;根据所述编码可见补丁、所述掩码令牌以及所述重要区域块在所述原始图像中的位置信息,生成图像令牌,并根据所述图像令牌生成所述比特流。
在本公开的一些实施例中,所述获取模块还用于,获取n×n的原始图像,其中,n为正整数;将所述n×n的原始图像按照非重叠区域均匀分块为m×m,得到每个图像块的大小为其中,m为正整数,n>m。
在本公开的一些实施例中,所述计算模块还用于,将所述多个图像块中梯度平均值小于预设值的图像块丢弃;其中,设置所述预设值使得丢弃的图像块的数量与图像的预设压缩比率α满足公式:100%,其中,p为丢弃的图像块数量。
根据本公开实施例的一个方面,提供一种图像解码方法,对如上所述的图像编码方法所进行的编码进行解码,所述图像解码方法包括:接收经过编码生成的比特流;将所述比特流进行解码,并将解码结果经过归一化、多头注意力机制以及多层感知器处理,输出重构后的图像。
根据本公开实施例的一个方面,提供一种图像解码装置,所述图像解码装置包括:接收模块,用于接收经过编码生成的比特流;解码模块,用于将所述比特流进行解码,并将解码结果经过归一化、多头注意力机 制以及多层感知器处理,输出重构后的图像。
根据本公开实施例的一个方面,提供一种计算机可读介质,其上存储有计算机程序,该计算机程序被处理器执行时实现如以上技术方案中的图像编码方法,或者图像解码方法。
根据本公开实施例的一个方面,提供一种电子设备,该电子设备包括:处理器;以及存储器,用于存储所述处理器的可执行指令;其中,所述处理器被配置为经由执行所述可执行指令来执行如以上技术方案中的图像编码方法,或者图像解码方法。
根据本公开实施例的一个方面,提供一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该计算机设备执行如以上技术方案中的图像编码方法,或者图像解码方法。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本公开。
附图说明
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本公开的实施例,并与说明书一起用于解释本公开的原理。显而易见地,下面描述中的附图仅仅是本公开的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1示意性地示出了示例性的图像编解码系统架构框图。
图2示意性地示出了应用本公开技术方案的示例性系统架构框图。
图3示意性地示出了本公开一实施例提供的图像编码方法步骤流程。
图4示意性地示出了应用本公开技术方案的图片分块示意图。
图5示意性地示出了应用本公开技术方案的每个分块的梯度平均值示意图。
图6示意性地示出了应用本公开技术方案的编码器模块示意图。
图7示意性地示出了应用本公开技术方案的解码器模块示意图。
图8示意性地示出了应用本公开技术方案的编解流程示意图。
图9示意性地示出了本公开实施例提供的图像编码装置的结构框图。
图10示意性示出了适于用来实现本公开实施例的电子设备的计算机系统结构框图。
具体实施方式
现在将参考附图更全面地描述示例实施方式。然而,示例实施方式能够以多种形式实施,且不应被理解为限于在此阐述的范例;相反,提供这些实施方式使得本公开将更加全面和完整,并将示例实施方式的构思全面地传达给本领域的技术人员。
此外,所描述的特征、结构或特性可以以任何合适的方式结合在一个或更多实施例中。在下面的描述中,提供许多具体细节从而给出对本公开的实施例的充分理解。然而,本领域技术人员将意识到,可以实践本公开的技术方案而没有特定细节中的一个或更多,或者可以采用其它的方法、组元、装置、步骤等。在其它情况下,不详细示出或描述公知方法、装置、实现或者操作以避免模糊本公开的各方面。
附图中所示的方框图仅仅是功能实体,不一定必须与物理上独立的实体相对应。即,可以采用软件形式来实现这些功能实体,或在一个或多个硬件模块或集成电路中实现这些功能实体,或在不同网络和/或处理器装置和/或微控制器装置中实现这些功能实体。
附图中所示的流程图仅是示例性说明,不是必须包括所有的内容和操作/步骤,也不是必须按所描述的顺序执行。例如,有的操作/步骤还可以分解,而有的操作/步骤可以合并或部分合并,因此实际执行的顺序有可能根据实际情况改变。
随着机器视觉智能任务的普及,比如图像分类、视频目标检测、目标跟踪、图像分割、行人重识别等的快速发展,若采用目前相关技术方案中基于卷积神经网络进行图像/视频编解码技术,由于该方法均为对整张图片所有区域进行统一编码,不利于图像编码/解码。
对此,本公开提供一种图像编码方法、解码方法、装置、可读介质及电子设备。在本公开实施例提供的技术方案中,先对原始图像进行分块, 采用分块区域梯度计算和视觉转换模型相结合,这样,该方法针对被压缩图像的不同信息区域,进行有选择、可控制地压缩,对于图像/视频中信息密集的重点区域尽可能多的保留,尽量少压缩;而对图像中信息稀疏的非重点区域,则尽可能的压缩,提高图像压缩效率,实现统一方案下灵活的码率控制,从而在一定程度上实现了提高图像压缩效率。
参见图1,图1示意性地示出了示例性的图像编解码系统架构框图。
该系统包括数据采集模块101、编码器模块102、解码器模块103,其中数据采集模块101用来采集图像/视频1001,并将其传输到编码器模块102;在编码器模块102可采用卷积神经网络,将图像/视频1001编码为比特流1002,将比特流1002传输到另一端的解码器模块103;在解码器模块103,同样可采用卷积神经网络,将比特流重构为图像/视频1003;接着将重构后的图像/视频作1003为人类视觉任务104的输入;最后,经过人类视觉任务计算,得到结果1004。
采用该方式存在如下技术问题,编码器和解码器对图像/视频的编码,均为对整张图片所有区域进行统一编码,不能区分图像本身的重点区域和非重点区域;对整张图片所有区域进行统一编码后,图像的重点区域压缩比较大,会丢失图像的重要信息;当前方法无法对图像各区域进行有选择的压缩。即当前的图像编码/解码方法,在图像编码时候无法丢弃图片中的非重要区域图片分块。所以,当前这些基于深度卷积神经网络结构设计的编码器,其压缩比率控制不灵活。另外,当前的编码/解码系统方法是面向人类视觉任务的,该系统在面向机器视觉任务时,不能很好的完成机器视觉智能分析任务。
为了解决上述问题,本公开在面向机器视觉智能分析任务的编码系统中重新设计了编码器和解码器模块。在编码器的设计中,提出了一种基于区域梯度信息的转换器(Transformer)图像编码器。在解码器设计中,提出了一种基于Transformer模块(Block)的解码器。参见图2,图2示意性地示出了应用本公开技术方案的示例性系统架构框图。该系统架构包括数据收集模块,用于收集(S201)图像/视频,以得到原始图像2001,接着,将原始图像输入编码器模块2002,对原始图像依次经过分块处理(S202)、梯度计算(S203)、重要区域计算(S204)以及视觉转换器(vision  Transformer)进行编码(S205)输出比特流,将编码器模块2002输出的比特流经过采用转换器模块的解码器模块2003将比特流重构(S206)为图像/视频,将重构后的图像/视频作为机器视觉任务的输入;最后,经过机器视觉任务计算(S207),得到结果2004。
为了实现对图像/视频不同区域进行有选择地压缩,本公开在对图像/视频数据进行编码时,所设计的编码器为区域梯度计算和转换器模块相结合的方法,图像解码器的设计是仅为转换器模块。在对图像编码时候,将图片做分块计算,然后对每个分块区域中的图像像素计算梯度值,并计算每个区域的梯度计算值的平均值,根据梯度计算值的平均值对所有图像块进行排序,并丢弃排序靠后的分块。将排序靠前的图像块输入到后续的转换器模块中,而把其他图像块区域的图片直接丢弃,通过控制丢弃图片的比例实现灵活控制压缩率。
下面结合具体实施方式对本公开提供的图像编码方法、解码方法、装置、可读介质及电子设备做出详细说明。
参见图3,图3示意性地示出了本公开一实施例提供的图像编码方法步骤流程。该图像编码方法可以由控制器来执行,主要可以包括如下步骤S301至步骤S303。
步骤S301,获取原始图像,并进行分块处理,获得多个图像块。
在一些实施例中,可通过数据采集模块获取图像/视频得到原始图像,接着将原始图像进行分块处理。例如原始图像大小为n×n,将n×n的图像按照非重叠区域均匀分块为m×m个图像块,每一个图像块的大小为参见图4,图4示意性地示出了应用本公开技术方案的图片分块示意图。以28*28的原始图像4002为例,将其按照非重叠区域均匀分块为4*4个图像块(S402),获得分块结果4004,其中每个图像块大小为7*7。这样,通过对原始图像进行分块处理,可有利于后续确定重要区域块。
步骤S302,计算每个图像块内像素的梯度值,根据像素的梯度值从多个图像块中筛选重要区域块。
在一些实施例中,通过计算各个分块内像素的梯度值,有利于根据像素的梯度值筛选重要区域块。这样,可以针对被压缩图像的不同信息区域,进行有选择、可控制地压缩,对于图像/视频中信息密集的重点区域尽可能多的保留,尽量少压缩;而对图像中信息稀疏的非重点区域,则尽可能的压缩,提高图像压缩效率,实现统一方案下灵活的码率控制。
步骤S303,将重要区域块以及重要区域块在原始图像中的位置信息,输入视觉转换模型中进行编码,以生成比特流。
在本公开实施例提供的技术方案中,先对原始图像进行分块,采用分块区域梯度计算和视觉转换模型相结合,这样,该方法针对被压缩图像的不同信息区域,进行有选择、可控制地压缩,对于图像/视频中信息密集的重点区域尽可能多的保留,尽量少压缩;而对图像中信息稀疏的非重点区域,则尽可能的压缩,提高图像压缩效率,实现统一方案下灵活的码率控制。
在本公开的一些实施例中,计算每个图像块内像素的梯度值,根据像素的梯度值从多个图像块中筛选重要区域块,可以包括:计算每个图像块内像素的梯度值,根据像素的梯度值计算各个图像块的梯度平均值;根据梯度平均值对多个图像块进行排序,并将多个图像块中梯度平均值不小于预设值的图像块确定为重要区域块。
这样,根据梯度计算均值排序并筛选出信息聚焦的重要区域,可以丢弃图像中的非重要区域块,实现图像的压缩。
在一些实施例中,在选择重要区域块时,先对各个图像块中的每个像素(x,y),分别计算其x方向和y方向的梯度。x方向的梯度计算如下面的公式(1)所示:
y方向的梯度计算如下面公式(2)所示:
将像素(x,y)在x方向的梯度值gx和y方向的梯度值gy的值如下面公式(3)计算:
其中,g(x,y)即为(x,y)的梯度计算值。然后将该分块区域内所有像素的梯度计算值如下面公式(4)计算:
其中d(i,j)为每个分块内所有像素的梯度平均值。i和j的取值范围均为0到m-1。
参见图5,图5示意性地示出了应用本公开技术方案的每个图像块的梯度平均值示意图。对大小为n×n的原始图像5002按照非重叠区域均匀分块(S502)为m×m 504,得到每个图像块的大小为然后将所有图像块,按照d(i,j)的值,进行从大到小的排序,即{d(2,2),d(1,2),d(2,1),d(1,1),……}依次排序。将排序靠后的d(i,j)值较小的p个图像块丢弃,剩余的图像块数量为n×n-p,将这些剩余的图像块作为重要区域块。
这样,该方法针对被压缩图像的不同信息区域,进行有选择、可控制地压缩,对于图像/视频中信息密集的重点区域尽可能多的保留,尽量少压缩;而对图像中信息稀疏的非重点区域,则尽可能的压缩,提高图像压缩效率,实现统一方案下灵活的码率控制。
在本公开的一个实施例中,将重要区域块以及重要区域块在原始图像中的位置信息,输入视觉转换模型中进行编码,以生成比特流,包括:将重要区域块输入视觉转换模型,输出编码可见补丁和掩码令牌;根据编码可见补丁、掩码令牌以及重要区域块在原始图像中的位置信息,生成图像令牌,并根据图像令牌生成比特流。
参见图6,图6示意性地示出了应用本公开技术方案的编码器模块示意图。得到原始图像的非重叠区域分块结果6002后,进行梯度计算(S602)后,进行重要区域计算(S604)确定重要区域块,并将没有被丢弃的重要区域块以及它们在原始图像中的位置信息一起输入到视觉转换(Vision  Transformer)模型6004中。其中,将重要区域块的补丁嵌入(Patch Embeddings)和位置嵌入(Positional Embeddings)60042信息,输入到视觉转换模型6004的编码器(Encoder)60044模块中。
经过视觉转换模型6004的多个编码器模块计算之后,会得到与输入同样数量的p个补丁信息和其位置信息,这时候再根据位置信息重新排列得到和原图尺寸相同数量的d×d个图像块。这d×d个图像块中,之前没有被丢弃的重要区域块的补丁信息经过视觉转换模型6004计算之后得到的,被称为编码可见补丁(Encoded Visible Patches)。其余根据位置信息重新排列得到的,被称为掩码令牌(Mask Tokens)。
这样,相关技术方案的视频编码系统方法均是面向人类视觉任务的,在面向机器视觉任务时,不能很好的完成机器视觉智能分析任务。本实施例的技术提案是面向机器视觉任务的,能够较好的完成机器视觉智能分析任务。
在本公开的一些实施例中,可设置预设值使得丢弃的图像块的数量与图像的预设压缩比率α满足公式(5):
其中,p为丢弃的图像块数量。
这样,通过图像分块之后控制丢弃图像块的比率,可以灵活控制压缩率。
根据本公开实施例的一个方面,提供一种图像解码方法,对如上所述的图像编码方法所进行的编码进行解码,图像解码方法包括:接收经过编码生成的比特流;将比特流进行解码,并将解码结果经过归一化、多头注意力机制以及多层感知器处理,输出重构后的图像。
参见图7,图7示意性地示出了应用本公开技术方案的解码器模块示意图。参照图6,在得到基于区域梯度信息的视觉转换模型的输出编码可见补丁70022和掩码令牌70024之后,将这两部分结合原始图像的位置信息进行位置嵌入(S702)并相加,并将相加后的结果,输入到以转换器模块7004构建的解码器(Decoder)中进行解码(S704)。在解码器中,转换器模块7004由归一化(Normalize)层70042和70046、多头自注意 力(Multi-head Self Attention)层70044以及多层神经网络(Multi-Layer Perceptron,MLP,也称为多层感知器)模块70048构成。
在将归一化层70042输出的向量为t的图像块的信息输入到多头自注意力层70044之后,随机初始化多头自注意力层70044中的权重矩阵Wt,以及每个头的注意力权重矩阵
接着将图像块向量t,分别乘以每个头的注意力权重矩阵以计算得到这个图像块向量对应的三个矩阵Qt,Kt,Vt,计算公式如下式(6):
则每个图像块向量t对应各头的注意力,计算如下公式(7)、(8)所示:

在公式中,ht为图像块向量t分别对应各个注意力的头(head),是矩阵Kt的维度,δ(Qt,Kt,Vt)是计算注意力的函数,为Softmax逻辑回归函数。
在公式(8)中,代表连接(Concatenate)函数,为参数矩阵,计算结果r表示多头的值。解码器的输出即为重构图像7006。
为了便于理解本公开的技术方案,参见图8,图8示意性地示出了应用本公开技术方案的编解流程示意图。
在编码端:步骤S801,对原始图像(或视频)8002进行分块计算,将n×n的图像按照非重叠区域均匀分块为m×m。
步骤S802,对图像分块中的每个像素(x,y),利用公式(1)和公式(2)分别计算其x方向梯度和y方向的梯度。
步骤S803,利用公式(3)计算像素(x,y)的梯度计算值。
步骤S804,利用公式(4)计算每个图像块内所有像素的梯度平均值。
步骤S805,将所有图像块按照d(i,j)的值进行排序。丢弃排序靠后的d(i,j)值较小的p个分块,压缩比α的计算满足公式(5)。
步骤S806,根据丢弃操作的结果生成图像令牌,例如包括编码可见补丁和掩码令牌。
步骤S807,根据图像令牌生成比特流8004。
在解码端:步骤S808,根据比特流8006得出编码可见补丁、掩码令牌以及位置嵌入信息。
步骤S809,将对编码可见补丁、掩码令牌进行位置嵌入获得的数据做归一化处理;
步骤S810,采用式(6)至(8)计算多头自注意力。
步骤S811,对多头自注意力计算结果归一化处理。
步骤S812,对归一化后的结果进行多层感知机计算。
步骤S813,输出重构后的图片/视频8008。
本公开在图像编解码中,采用基于区域梯度计算的方法设计图像编解码器,即对图像内容信息进行有选择的压缩;提出了对分块的图像做梯度、梯度计算值以及梯度计算均值的计算,并根据梯度计算均值筛选出重要的信息分块;提出了根据梯度计算均值信息计算关键区域的思想,排序并筛选出信息聚焦的重要区域,进而丢弃图像中的非重要区域,实现图像有选择的压缩;通过图像分块之后控制丢弃图片分块的比率,可以灵活控制压缩率;另外,相关技术方案的视频编码系统方法均是面向人类视觉任务的,在面向机器视觉任务时,不能很好的完成机器视觉智能分析任务。而本公开提出的系统是面向机器视觉任务的,能够较好的完成机器视觉智能分析任务。
应当注意,尽管在附图中以特定顺序描述了本公开中方法的各个步骤,但是,这并非要求或者暗示必须按照该特定顺序来执行这些步骤,或是必须执行全部所示的步骤才能实现期望的结果。附加的或备选的,可以省略某些步骤,将多个步骤合并为一个步骤执行,以及/或者将一个步骤分解为多个步骤执行等。
以下介绍本公开的装置实施例,可以用于执行本公开上述实施例中的图像编码方法,或者图像解码方法。图9示意性地示出了本公开实施 例提供的图像编码装置的结构框图。如图9所示,一种图像编码装置,图像编码装置900可以包括获取模块901、计算模块902和编码模块903。
获取模块901,可用于获取原始图像,并进行分块处理,获得多个图像块;
计算模块902,可用于计算每个图像块内像素的梯度值,根据像素的梯度值从多个图像块中筛选重要区域块;
编码模块903,可用于将重要区域块以及重要区域块在原始图像中的位置信息,输入视觉转换模型中进行编码,以生成比特流。
在本公开的一些实施例中,计算模块902还可用于,计算每个图像块内像素的梯度值,根据像素的梯度值计算各个图像块的梯度平均值;根据梯度平均值对多个图像块进行排序,并将多个图像块中梯度平均值不小于预设值的图像块确定为重要区域块。
在本公开的一些实施例中,编码模块903还可用于,将重要区域块输入视觉转换模型,输出编码可见补丁和掩码令牌;根据编码可见补丁、掩码令牌以及重要区域块在原始图像中的位置信息,生成图像令牌,并根据图像令牌生成比特流。
在本公开的一些实施例中,基于以上技术方案,获取模块901还可用于,获取n×n的原始图像,其中,n为正整数;将n×n的原始图像按照非重叠区域均匀分块为m×m,得到每个图像块的大小为 其中,m为正整数,n>m。
在本公开的一些实施例中,计算模块还可用于,将多个图像块中梯度平均值小于预设值的图像块丢弃;其中,设置预设值使得丢弃的图像块的数量与图像的预设压缩比率α满足公式:
其中,p为丢弃的图像块数量。
根据本公开实施例的一个方面,提供一种图像解码装置,图像解码装置可以包括:接收模块,可用于接收经过编码生成的比特流;解码模块,可用于将比特流进行解码,并将解码结果经过归一化、多头注意力机制以及多层感知器处理,输出重构后的图像。
本公开各实施例中提供的图像编码装置或者图像解码装置的具体细节已经在对应的方法实施例中进行了详细的描述,此处不再赘述。
图10示意性地示出了用于实现本公开实施例的电子设备的计算机系统结构框图。
需要说明的是,图10示出的电子设备的计算机系统1000仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。
如图10所示,计算机系统1000包括中央处理器1001(Central Processing Unit,CPU),其可以根据存储在只读存储器1002(Read-Only Memory,ROM)中的程序或者从存储部分1008加载到随机访问存储器1003(Random Access Memory,RAM)中的程序而执行各种适当的动作和处理。在随机访问存储器1003中,还存储有系统操作所需的各种程序和数据。中央处理器1001、在只读存储器1002以及随机访问存储器1003通过总线1004彼此相连。输入/输出接口1005(Input/Output接口,即I/O接口)也连接至总线1004。
以下部件连接至输入/输出接口1005:包括键盘、鼠标等的输入部分1006;包括诸如阴极射线管(Cathode Ray Tube,CRT)、液晶显示器(Liquid Crystal Display,LCD)等以及扬声器等的输出部分1007;包括硬盘等的存储部分1008;以及包括诸如局域网卡、调制解调器等的网络接口卡的通信部分1009。通信部分1009经由诸如因特网的网络执行通信处理。驱动器1010也根据需要连接至输入/输出接口1005。可拆卸介质1011,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器1010上,以便于从其上读出的计算机程序根据需要被安装入存储部分1008。
特别地,根据本公开的实施例,各个方法流程图中所描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信部分1009从网络上被下载和安装,和/或从可拆卸介质1011被安装。在该计算机程序被中央处理器1001执行时,执行本公开的系统中限定的各种功能。
需要说明的是,本公开实施例所示的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(Erasable Programmable Read Only Memory,EPROM)、闪存、光纤、便携式紧凑磁盘只读存储器(Compact Disc Read-Only Memory,CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:无线、有线等等,或者上述的任意合适的组合。
附图中的流程图和框图,图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,上述模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图或流程图中的每个方框、以及框图或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计 算机指令的组合来实现。
应当注意,尽管在上文详细描述中提及了用于动作执行的设备的若干模块或者单元,但是这种划分并非强制性的。实际上,根据本公开的实施方式,上文描述的两个或更多模块或者单元的特征和功能可以在一个模块或者单元中具体化。反之,上文描述的一个模块或者单元的特征和功能可以进一步划分为由多个模块或者单元来具体化。
通过以上的实施方式的描述,本领域的技术人员易于理解,这里描述的示例实施方式可以通过软件实现,也可以通过软件结合必要的硬件的方式来实现。因此,根据本公开实施方式的技术方案可以以软件产品的形式体现出来,该软件产品可以存储在一个非易失性存储介质(可以是CD-ROM,U盘,移动硬盘等)中或网络上,包括若干指令以使得一台计算设备(可以是个人计算机、服务器、触控终端、或者网络设备等)执行根据本公开实施方式的方法。
本领域技术人员在考虑说明书及实践这里公开的发明后,将容易想到本公开的其它实施方案。本公开旨在涵盖本公开的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未公开的本技术领域中的公知常识或惯用技术手段。
应当理解的是,本公开并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本公开的范围仅由所附的权利要求来限制。

Claims (10)

  1. 一种图像编码方法,其中,所述图像编码方法包括:
    获取原始图像,并进行分块处理,获得多个图像块;
    计算每个图像块内像素的梯度值,根据所述像素的梯度值从所述多个图像块中筛选重要区域块;
    将所述重要区域块以及所述重要区域块在所述原始图像中的位置信息,输入视觉转换模型中进行编码,以生成比特流。
  2. 根据权利要求1所述的图像编码方法,其中,所述计算每个图像块内像素的梯度值,根据所述像素的梯度值从所述多个图像块中筛选重要区域块,包括:
    计算每个图像块内像素的梯度值,根据所述像素的梯度值计算各个图像块的梯度平均值;
    根据所述梯度平均值对所述多个图像块进行排序,并将所述多个图像块中梯度平均值不小于预设值的图像块确定为所述重要区域块。
  3. 根据权利要求1或2所述的图像编码方法,其中,所述将所述重要区域块以及所述重要区域块在所述原始图像中的位置信息,输入视觉转换模型中进行编码,以生成比特流,包括:
    将所述重要区域块输入所述视觉转换模型,输出编码可见补丁和掩码令牌;
    根据所述编码可见补丁、所述掩码令牌以及所述重要区域块在所述原始图像中的位置信息,生成图像令牌,并根据所述图像令牌生成所述比特流。
  4. 根据权利要求2所述的图像编码方法,其中,所述获取原始图像,并进行分块处理,获得多个图像块,包括:
    获取n×n的原始图像,其中,n为正整数;
    将所述n×n的原始图像按照非重叠区域均匀分块为m×m,得到每个图像块的大小为其中,m为正整数,n>m。
  5. 根据权利要求4所述的图像编码方法,其中,所述计算每个图像块内像素的梯度值,根据所述像素的梯度值从所述多个图像块中筛选重 要区域块,还包括:
    将所述多个图像块中梯度平均值小于预设值的图像块丢弃;
    其中,设置所述预设值使得丢弃的图像块的数量与图像的预设压缩比率α满足公式:
    其中,p为丢弃的图像块数量。
  6. 一种图像解码方法,其中,对权利要求1至5任意一项所述的图像编码方法所进行的编码进行解码,所述图像解码方法包括:
    接收经过编码生成的比特流;
    将所述比特流进行解码,并将解码结果经过归一化、多头注意力机制以及多层感知器处理,输出重构后的图像。
  7. 一种图像编码装置,其中,所述图像编码装置包括:
    获取模块,用于获取原始图像,并进行分块处理,获得多个图像块;
    计算模块,用于计算每个图像块内像素的梯度值,根据所述像素的梯度值从所述多个图像块中筛选重要区域块;
    编码模块,用于将所述重要区域块以及所述重要区域块在所述原始图像中的位置信息,输入视觉转换模型中进行编码,以生成比特流。
  8. 一种图像解码装置,其中,所述图像解码装置包括:
    接收模块,用于接收经过编码生成的比特流;
    解码模块,用于将所述比特流进行解码,并将解码结果经过归一化、多头注意力机制以及多层感知器处理,输出重构后的图像。
  9. 一种计算机可读介质,其中,所述计算机可读介质上存储有计算机程序,该计算机程序被处理器执行时实现如权利要求1至5中任意一项所述的图像编码方法,或者如权利要求6所述的图像解码方法。
  10. 一种电子设备,其中,包括:
    处理器;以及
    存储器,用于存储所述处理器的可执行指令;
    其中,所述处理器配置为经由执行所述可执行指令来执行权利要求1至5中任意一项所述的图像编码方法,或者如权利要求6所述的图像解码方法。
PCT/CN2023/107504 2022-07-15 2023-07-14 图像编码方法、解码方法、装置、可读介质及电子设备 WO2024012574A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210837739.2 2022-07-15
CN202210837739.2A CN115514976A (zh) 2022-07-15 2022-07-15 图像编码方法、解码方法、装置、可读介质及电子设备

Publications (2)

Publication Number Publication Date
WO2024012574A1 true WO2024012574A1 (zh) 2024-01-18
WO2024012574A9 WO2024012574A9 (zh) 2024-02-29

Family

ID=84502698

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/107504 WO2024012574A1 (zh) 2022-07-15 2023-07-14 图像编码方法、解码方法、装置、可读介质及电子设备

Country Status (2)

Country Link
CN (1) CN115514976A (zh)
WO (1) WO2024012574A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115514976A (zh) * 2022-07-15 2022-12-23 中国电信股份有限公司 图像编码方法、解码方法、装置、可读介质及电子设备
CN117649569A (zh) * 2022-08-19 2024-03-05 中国电信股份有限公司 图像特征处理方法和装置、存储介质
CN116132818B (zh) * 2023-02-01 2024-05-24 辉羲智能科技(上海)有限公司 用于自动驾驶的图像处理方法及系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5144688A (en) * 1990-03-23 1992-09-01 Board Of Regents, The University Of Texas System Method and apparatus for visual pattern image coding
JPH06350992A (ja) * 1993-06-08 1994-12-22 Sony Corp データ圧縮回路
US20070201751A1 (en) * 2006-02-24 2007-08-30 Microsoft Corporation Block-Based Fast Image Compression
CN114428866A (zh) * 2022-01-26 2022-05-03 杭州电子科技大学 一种基于面向对象的双流注意力网络的视频问答方法
CN115514976A (zh) * 2022-07-15 2022-12-23 中国电信股份有限公司 图像编码方法、解码方法、装置、可读介质及电子设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5144688A (en) * 1990-03-23 1992-09-01 Board Of Regents, The University Of Texas System Method and apparatus for visual pattern image coding
JPH06350992A (ja) * 1993-06-08 1994-12-22 Sony Corp データ圧縮回路
US20070201751A1 (en) * 2006-02-24 2007-08-30 Microsoft Corporation Block-Based Fast Image Compression
CN114428866A (zh) * 2022-01-26 2022-05-03 杭州电子科技大学 一种基于面向对象的双流注意力网络的视频问答方法
CN115514976A (zh) * 2022-07-15 2022-12-23 中国电信股份有限公司 图像编码方法、解码方法、装置、可读介质及电子设备

Also Published As

Publication number Publication date
WO2024012574A9 (zh) 2024-02-29
CN115514976A (zh) 2022-12-23

Similar Documents

Publication Publication Date Title
WO2024012574A1 (zh) 图像编码方法、解码方法、装置、可读介质及电子设备
Duan et al. Video coding for machines: A paradigm of collaborative compression and intelligent analytics
Liu et al. Learned image compression with mixed transformer-cnn architectures
Cai et al. End-to-end optimized roi image compression
Emmons et al. Cracking open the dnn black-box: Video analytics with dnns across the camera-cloud boundary
CN111405283A (zh) 基于深度学习的端到端视频压缩方法、系统及存储介质
Wu et al. Learned block-based hybrid image compression
WO2023016155A1 (zh) 图像处理方法、装置、介质及电子设备
US11544606B2 (en) Machine learning based video compression
Tang et al. Joint graph attention and asymmetric convolutional neural network for deep image compression
Zhang et al. Attention-guided image compression by deep reconstruction of compressive sensed saliency skeleton
Yan et al. Semantically scalable image coding with compression of feature maps
Fu et al. Learned image compression with generalized octave convolution and cross-resolution parameter estimation
Löhdefink et al. Focussing learned image compression to semantic classes for V2X applications
WO2023174256A1 (zh) 一种数据压缩方法以及相关设备
WO2022100140A1 (zh) 一种压缩编码、解压缩方法以及装置
Jilani et al. JPEG image compression using FPGA with Artificial Neural Networks
Lin et al. DeepSVC: Deep scalable video coding for both machine and human vision
CN115661276A (zh) 图像数据的编码方法、装置、设备、介质及程序
WO2023177318A1 (en) Neural network with approximated activation function
WO2023177317A1 (en) Operation of a neural network with clipped input data
Zheng et al. End-to-end rgb-d image compression via exploiting channel-modality redundancy
WO2023085962A1 (en) Conditional image compression
WO2023050433A1 (zh) 视频编解码方法、编码器、解码器及存储介质
CN114222124B (zh) 一种编解码方法及设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23839062

Country of ref document: EP

Kind code of ref document: A1