WO2021022686A1 - Video compression method and apparatus, and terminal device - Google Patents

Video compression method and apparatus, and terminal device Download PDF

Info

Publication number
WO2021022686A1
WO2021022686A1 PCT/CN2019/114953 CN2019114953W WO2021022686A1 WO 2021022686 A1 WO2021022686 A1 WO 2021022686A1 CN 2019114953 W CN2019114953 W CN 2019114953W WO 2021022686 A1 WO2021022686 A1 WO 2021022686A1
Authority
WO
WIPO (PCT)
Prior art keywords
frame
compression
network corresponding
video
reconstruction
Prior art date
Application number
PCT/CN2019/114953
Other languages
French (fr)
Chinese (zh)
Inventor
周雷
Original Assignee
合肥图鸭信息科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 合肥图鸭信息科技有限公司 filed Critical 合肥图鸭信息科技有限公司
Publication of WO2021022686A1 publication Critical patent/WO2021022686A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/13Adaptive entropy coding, e.g. adaptive variable length coding [AVLC] or context adaptive binary arithmetic coding [CABAC]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/172Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/189Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding
    • H04N19/196Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding being specially adapted for the computation of encoding parameters, e.g. by averaging previously computed encoding parameters

Definitions

  • This application relates to the technical field of video compression, and in particular to a video compression method, device and terminal equipment.
  • the most common way to store video data is to simply store the original video data for the entire period. This method saves a lot of background information, that is, there is a lot of redundancy in the saved information, which consumes huge storage costs, and it is difficult to save long-term video records. Therefore, the original video data must be compressed and stored to reduce storage space and extend storage time.
  • many video compression algorithms and technologies have emerged, such as the MPEG compression algorithm. Its advantage is that the image quality after decoding is clearer, but the algorithm has a low compression rate and takes up a lot of bandwidth; H.264 The compression rate of the compression technology is high, but the picture quality after decoding is relatively poor, and the occupied bandwidth varies greatly with the complexity of the picture motion.
  • One of the objectives of the embodiments of the present application is to provide a video compression method, device, and terminal equipment, aiming to solve the problem of low video compression performance.
  • a video compression method including:
  • the weight parameters of the compression network corresponding to each of the stored frames are formed into a compressed video in the order of the frame sequence.
  • a video compression device including:
  • the frame sequence module is used to represent the video to be compressed as a frame sequence containing N frames, where N is a positive integer;
  • the network construction module is used to construct a corresponding compression network for each frame in the frame sequence
  • the reconstruction module is used to train the compression network corresponding to the m-th frame according to the m-1th frame reconstruction frame, and obtain the m-th frame reconstruction frame according to the compression network corresponding to the m-th frame after the training, and store it at the same time
  • the weight parameter of the compression network corresponding to the mth frame where 1 ⁇ m ⁇ N, and m is a positive integer;
  • the compressed video forming module is used for forming the compressed video according to the order of the frame sequence by the weight parameters of the compression network corresponding to each of the stored frames.
  • a video compression terminal device including a memory, a processor, and a computer program stored in the above-mentioned memory and running on the above-mentioned processor.
  • the above-mentioned processor executes the above-mentioned computer program, the above-mentioned computer program is implemented Steps of the method.
  • this embodiment constructs a new video compression framework: first constructs a compression network corresponding to each frame, and then trains the next frame through the reconstructed frame of the previous frame. Compress the network and reconstruct the next frame.
  • the video compression performance is improved: under the same reconstruction quality condition, the compression rate is higher; under the same compression rate, the reconstruction quality is better.
  • a compression network corresponding to each frame is constructed through the network construction module, and then the compression network of the next frame is trained by the reconstruction module according to the reconstructed frame of the previous frame And reconstruct the next frame.
  • the video compression performance is improved: under the same reconstruction quality condition, the compression rate is higher; under the same compression rate, the reconstruction quality is better.
  • this embodiment constructs a new video compression framework: first constructs a compression network corresponding to each frame, and then trains the next frame through the reconstructed frame of the previous frame Compress the network and reconstruct the next frame.
  • the video compression performance is improved: under the same reconstruction quality condition, the compression rate is higher; under the same compression rate, the reconstruction quality is better.
  • FIG. 1 is a schematic diagram of the implementation process of a video compression method provided by an embodiment of the present application
  • Figure 2 is a schematic diagram of the implementation process of training a compression network provided by another embodiment of the present application.
  • FIG. 3 is a schematic diagram of a compressed network structure provided by another embodiment of the present application.
  • FIG. 4 is a schematic diagram of an implementation process of generating a reconstructed frame provided by another embodiment of the present application.
  • FIG. 5 is a schematic diagram of a video compression device provided by another embodiment of the present application.
  • Fig. 6 is a schematic diagram of a video compression terminal device provided by another embodiment of the present application.
  • Fig. 1 shows the implementation process of the video compression method provided in the first embodiment of the present invention.
  • the execution subject of the method may be a terminal device. The details are as follows:
  • step S101 the video to be compressed is represented as a frame sequence containing N frames, where N is a positive integer.
  • the continuous image changes more than 24 frames per second, according to the principle of persistence of vision, human eyes cannot distinguish a single static image, and it looks like a smooth and continuous visual effect.
  • Such continuous images are called videos. So the video is composed of several frames, that is, the video is a frame sequence containing N frames, where N is a positive integer.
  • step S102 a corresponding compression network is constructed for each frame in the aforementioned frame sequence.
  • a compression network is constructed for each frame in the aforementioned frame sequence, where the aforementioned compression network may be a convolutional neural network.
  • the construction of the compressed network may be a manually constructed network, or a network search method may be used to construct a network, or it may be a combination of the two, which is not limited here.
  • Step S103 Train the compression network corresponding to the m-th frame according to the m-1th frame reconstruction frame, and obtain the m-th frame reconstruction frame according to the compression network corresponding to the m-th frame after the training, and store the corresponding m-th frame
  • the weight parameter of the compression network where 1 ⁇ m ⁇ N, and m is a positive integer.
  • the aforementioned data matrix may be a randomly generated noise matrix, or a matrix whose values are all 1, and the data matrix is not limited here.
  • FIG. 2 shows a flowchart of training a compression network.
  • the training of the compression network corresponding to the m-th frame according to the reconstructed frame of the m-1th frame includes:
  • the m-1th frame reconstructed frame is input into a compression network corresponding to the mth frame.
  • the compression network may be a convolutional neural network, and the convolutional neural network may include at least one convolutional layer.
  • the above-mentioned convolutional layer may include a convolution kernel, and the image input to the convolutional layer undergoes a convolution operation with the convolution kernel to remove redundant image information, and output an image containing feature information. If the size of the above convolution kernel is greater than 1 ⁇ 1, the convolutional layer can output multiple feature maps whose size is smaller than the input image.
  • the size of the image input to the convolutional neural network is Multi-level shrinking, to obtain multiple feature maps whose size is smaller than the image size of the input neural network.
  • inputting the reconstructed frame of the m-1th frame into the compression network to generate the corresponding reconstructed image of the mth frame may be a deconvolution operation, and the deconvolution operation is the same as the input image described above. The process of removing redundant information to generate feature images is reversed.
  • the compression network may be a convolutional neural network, including four convolution or deconvolution layers, and the size of the weight parameter matrix corresponding to the four convolution or deconvolution layers is H 1 * W 1 *N 1 *C 1 , H 2 *W 2 *N 2 *C 2 , H 3 *W 3 *N 3 *C 3 , H 4 *W 4 *N 4 *C 4 , the steps are respectively S 1 , S 2 , S 3 , S 4 .
  • H is the height of the weight parameter matrix
  • W is the width of the weight parameter matrix
  • N is the number of output channels of the weight parameter matrix
  • C is the number of input channels of the weight parameter matrix.
  • the number of input channels of the weight parameters of the four convolution or deconvolution layers is respectively associated with the number of output channels of the weight parameters of the previous layer.
  • C 1 C
  • C 2 N 1
  • C 3 N 2
  • C 4 N 3 .
  • H ' H*S 1 *S 2 *S 3 *S 4
  • W' W*S 1 *S 2 *S 3 *S 4
  • C' C 4 .
  • H ' H*S 1 *S 2 *S 3 *S 4
  • W' W*S 1 *S 2 *S 3 *S 4
  • S202 Calculate the loss function of the m-th frame reconstruction image and the m-th frame, perform gradient update according to the above-mentioned loss function, and adjust the weight parameter of the compression network corresponding to the m-th frame;
  • the loss function between the reconstructed image of the mth frame and the mth frame may use MSE (Mean Square Error).
  • MSE Mel Square Error
  • the formula of MSE is shown in formula (1):
  • H is the height of the m- th frame reconstruction image
  • W is the width of the m- th frame reconstruction image
  • C is the channel number of the m- th frame reconstruction image
  • X' represents the m- th frame reconstruction image
  • X represents the m- th frame
  • X'i ,j,k represent the value of the i-th row and j-th column of the k-th channel in the m- th frame reconstruction image
  • Xi ,j,k represent the value of the i-th row and jth column of the k-th channel in the m- th frame.
  • the gradient update formula is shown in formula (2):
  • W represents the weight parameter of the network
  • W' represents the weight parameter after update
  • is the preset learning rate
  • ⁇ W is the calculated gradient.
  • an existing adaptive gradient optimizer can be used for calculation.
  • the Adam optimizer can be used.
  • input the above-mentioned MSE calculation result, the weight parameter of the network, and the preset learning rate in the Adam optimizer to obtain the updated weight parameter.
  • S203 Repeat S201 to S202 until the compression network corresponding to the mth frame meets the preset condition.
  • repeating S201 to S202 until the compression network corresponding to the mth frame meets a preset condition includes:
  • the number of repeated executions of S201 to S202 reaches a preset number of times, where the preset number is manually preset in the video compression program or preset in the terminal device that loads the video compression program.
  • S201 to S202 are repeatedly executed until the compression network corresponding to the mth frame reaches a preset reconstruction quality.
  • the reconstruction quality of the compression network corresponding to the m-th frame can be represented by a peak signal-to-noise ratio (PSNR) (Peak Signal to Noise Ratio) and a pixel bit (BPP) (bits per pixel).
  • PSNR peak signal-to-noise ratio
  • BPP pixel bit
  • the test atlas is put into the compression network corresponding to the mth frame to test the reconstruction quality of the compression network corresponding to the mth frame, which can be represented by the peak signal-to-noise ratio PSNR and the pixel bit BPP.
  • PSNR peak signal-to-noise ratio
  • the aforementioned test atlas may include 24 Kodak standard test atlases, which are not limited here.
  • Figure 4 is a schematic diagram of the implementation process of generating a reconstructed frame provided by an embodiment of the present invention
  • the weight parameter of the compression network corresponding to the m-th frame is extracted, and the weight parameter of the compression network corresponding to the m-th frame is used as the compression feature.
  • the foregoing coding may be an entropy coding scheme such as Shannon coding, Huffman coding, or arithmetic coding, which is not limited here.
  • the reconstruction weight parameter generated in step S403 is replaced with the weight parameter in the compression network corresponding to the m-th frame to form a new compression network corresponding to the m-th frame.
  • S405 Input the m-1th frame into the compression network corresponding to the mth frame after the reconstruction weight parameter is updated, to obtain the mth frame reconstructed frame.
  • step S104 the weight parameters of the compression network corresponding to each of the above-mentioned stored frames are formed into a compressed video in the order of the above-mentioned frame sequence.
  • step S103 the weight parameters of the compression network corresponding to each frame are stored.
  • the collection of the weight parameters of the compression network corresponding to each frame is the compressed video.
  • a new video compression framework is constructed: first, a compression network corresponding to each frame is constructed, and then the compression network of the next frame is trained through the reconstruction frame of the previous frame and the next frame is reconstructed.
  • the video compression performance is improved: under the same reconstruction quality condition, the compression rate is higher; under the same compression rate, the reconstruction quality is better.
  • FIG. 5 shows a structural block diagram of a video compression device provided by an embodiment of the present invention. For ease of description, only parts related to the embodiment of the present invention are shown.
  • the video compression device 5 includes: a frame sequence module 51, a network construction module 52, a reconstruction module 53, and a compressed video construction module 54.
  • the aforementioned frame sequence module 51 is used to represent the video to be compressed as a frame sequence containing N frames, where N is a positive integer;
  • the network construction module 52 is used to construct a corresponding compression network for each frame in the above frame sequence
  • the reconstruction module 53 is used to train the compression network corresponding to the m-th frame according to the m-1th frame reconstruction frame, and obtain the m-th frame reconstruction frame according to the compression network corresponding to the m-th frame after the training, and store the above-mentioned
  • the weight parameter of the compression network corresponding to the m frame where 1 ⁇ m ⁇ N, and m is a positive integer;
  • the compressed video composing module 54 is used for composing the compressed video in the order of the aforementioned frame sequence by the weight parameters of the compression network corresponding to each of the aforementioned stored frames.
  • the above-mentioned reconstruction 53 module includes:
  • a reconstruction unit configured to use the m-1th frame reconstructed frame as the input of the compression network corresponding to the mth frame to obtain the mth frame reconstructed image
  • a parameter adjustment unit configured to calculate the loss function of the m-th frame reconstruction image and the m-th frame, perform gradient update according to the above-mentioned loss function, and adjust the weight parameters of the compression network corresponding to the m-th frame;
  • the repeated execution unit is configured to repeatedly execute the reconstruction unit to the parameter adjustment unit until the compression network corresponding to the mth frame meets the preset condition.
  • the repeated execution of the reconstruction unit to the parameter adjustment unit until the compression network corresponding to the m-th frame meets a preset condition includes:
  • the reconstruction unit is repeatedly executed until the number of parameter adjustment units reaches the preset number.
  • the foregoing reconstruction module 53 further includes:
  • a parameter extraction unit configured to use the weight parameter of the compression network corresponding to the mth frame as the compression feature
  • the coding unit is used to obtain coded data by coding the aforementioned compression features
  • a decoding unit configured to decode the aforementioned encoded data to generate reconstruction weight parameters
  • An initialization unit configured to initialize the compression network corresponding to the mth frame according to the reconstruction weight parameter
  • the reconstructed frame generating unit is configured to input the m-1th frame into the compression network corresponding to the mth frame after the reconstruction weight parameter is updated to obtain the mth frame reconstructed frame.
  • the m-1th frame is a pre-input data matrix.
  • the network construction module 52 constructs the compression network corresponding to each frame, and then the reconstruction module 53 trains the compression network of the next frame according to the reconstructed frame of the previous frame and reconstructs the next frame.
  • the video compression performance is improved: under the same reconstruction quality condition, the compression rate is higher; under the same compression rate, the reconstruction quality is better.
  • Fig. 6 is a schematic diagram of a video compression terminal device provided by an embodiment of the present invention.
  • the video compression terminal device 6 of this embodiment includes: a processor 60, a memory 61, and a computer program 62 stored in the memory 61 and running on the processor 60, such as a video compression program.
  • the processor 60 executes the computer program 62, the steps in the foregoing embodiments of the video compression method, such as steps 101 to 104 shown in FIG. 1, are implemented.
  • the processor 60 executes the computer program 62
  • the functions of the modules/units in the foregoing device embodiments, such as the functions of the modules 51 to 54 shown in FIG. 5, are realized.
  • the foregoing computer program 62 may be divided into one or more modules/units, and the foregoing one or more modules/units are stored in the foregoing memory 61 and executed by the foregoing processor 60 to complete the present invention.
  • the foregoing one or more modules/units may be a series of computer program instruction segments capable of completing specific functions, and the instruction segments are used to describe the execution process of the foregoing computer program 62 in the foregoing video compression terminal device 6.
  • the aforementioned computer program 62 can be divided into a frame sequence module, a network building module, a reconstruction module, and a compressed video building module.
  • the specific functions of each module are as follows:
  • the frame sequence module is used to represent the video to be compressed as a frame sequence containing N frames, where N is a positive integer;
  • the network construction module is used to construct a corresponding compression network for each frame in the above frame sequence
  • the reconstruction module is used to train the compression network corresponding to the m-th frame according to the m-1th frame reconstruction frame, and obtain the m-th frame reconstruction frame according to the compression network corresponding to the m-th frame after the training, and store the above-mentioned
  • the weight parameter of the compression network corresponding to the m frame where 1 ⁇ m ⁇ N, and m is a positive integer;
  • the compressed video composition module is used to compose the compressed video according to the order of the frame sequence of the weight parameters of the compression network corresponding to each of the above-mentioned stored frames.
  • the aforementioned video compression terminal device 6 may be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server.
  • the foregoing video compression terminal device may include, but is not limited to, a processor 60 and a memory 61.
  • FIG. 6 is only an example of the video compression terminal device 6 and does not constitute a limitation on the video compression terminal device 6. It may include more or less components than those shown in the figure, or combine certain components. Or different components, for example, the above-mentioned video compression terminal device may also include input and output devices, network access devices, buses, and so on.
  • the so-called processor 60 may be a central processing unit (Central Processing Unit, CPU), other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), Ready-made programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the above-mentioned memory 61 may be an internal storage unit of the above-mentioned video compression terminal device 6, such as a hard disk or a memory of the video compression terminal device 6.
  • the aforementioned memory 61 may also be an external storage device of the aforementioned video compression terminal device 6, such as a plug-in hard disk equipped on the aforementioned video compression terminal device 6, a Smart Media Card (SMC), and Secure Digital (SD) ) Card, Flash Card, etc.
  • the aforementioned memory 61 may also include both an internal storage unit of the aforementioned video compression terminal device 6 and an external storage device.
  • the memory 61 is used to store the computer program and other programs and data required by the video compression terminal device.
  • the aforementioned memory 61 may also be used to temporarily store data that has been output or will be output.
  • this embodiment constructs a new video compression framework: first constructs a compression network corresponding to each frame, and then trains the compression network of the next frame through the reconstruction frame of the previous frame and reconstructs the next frame .
  • the video compression performance is improved: under the same reconstruction quality condition, the compression rate is higher; under the same compression rate, the reconstruction quality is better.
  • the disclosed device/terminal device and method may be implemented in other ways.
  • the device/terminal device embodiments described above are merely illustrative.
  • the division of the above-mentioned modules or units is only a logical function division.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described above as separate components may or may not be physically separate, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • the functional units in the various embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.
  • the above integrated modules/units are implemented in the form of software functional units and sold or used as independent products, they can be stored in a computer readable storage medium. Based on this understanding, the present invention implements all or part of the processes in the above-mentioned embodiments and methods, and can also be completed by instructing relevant hardware through a computer program.
  • the above-mentioned computer program can be stored in a computer-readable storage medium. When executed by the processor, the steps of the foregoing method embodiments can be implemented.
  • the above-mentioned computer program includes computer program code, and the above-mentioned computer program code may be in the form of source code, object code, executable file, or some intermediate forms.
  • the above-mentioned computer-readable medium may include: any entity or device capable of carrying the above-mentioned computer program code, recording medium, U disk, mobile hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM, Read-Only Memory), random Access memory (RAM, Random Access Memory), electric carrier signal, telecommunications signal, software distribution medium, etc.
  • ROM Read-Only Memory
  • RAM Random Access Memory
  • electric carrier signal telecommunications signal
  • software distribution medium etc.
  • the content contained in the above-mentioned computer-readable media can be appropriately added or deleted in accordance with the requirements of the legislation and patent practice in the jurisdiction. For example, in some jurisdictions, according to the legislation and patent practice, the computer-readable media cannot Including electrical carrier signals and telecommunication signals.

Abstract

Disclosed are a video compression method and apparatus, a terminal device, and a computer storage medium. The video compression method comprises: representing a video to be compressed as a frame sequence including N frames; constructing a corresponding compression network for each frame in the frame sequence; training the compression network corresponding to an mth frame according to a reconstructed frame of a (m-1)th frame, obtaining a reconstructed frame of the mth frame according to the trained compression network corresponding to the mth frame, and simultaneously storing a weighting parameter of the compression network corresponding to the mth frame; and forming, according to the order of the frame sequence, a compressed video by using the stored weighting parameter of the compression network corresponding to each frame. In the present invention, a new video compression framework that is more concise than existing video compression frameworks is constructed by first constructing a compression network corresponding to each frame, and then training the compression network corresponding to a next frame by means of a reconstructed frame of a previous frame and reconstructing the next frame. The video compression performance is improved by means of learning prior information of a video.

Description

一种视频压缩方法、装置及终端设备Video compression method, device and terminal equipment
本申请要求于2019年08月08日在中国专利局提交的、申请号为201910727889.6、发明名称为“一种视频压缩方法、装置及终端设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application filed at the Chinese Patent Office on August 8, 2019, with the application number 201910727889.6 and the title of the invention "a video compression method, device and terminal equipment", the entire content of which is incorporated by reference Incorporated in this application.
技术领域Technical field
本申请涉及视频压缩技术领域,具体涉及一种视频压缩方法、装置及终端设备。This application relates to the technical field of video compression, and in particular to a video compression method, device and terminal equipment.
背景技术Background technique
随着多媒体信息技术的不断发展,视频信息大量涌现。视频数据作为一种表达信息的综合媒体,已成为我们现实生活中一个重要的信息载体。随着监控采像设备的日益普及,产生了海量的视频数据,视频存储是视频监控系统应用中非常重要的一个环节。海量的视频数据通常需要进行长时间的存储,才能为日后的视频录像资料检索、回放等提供服务。With the continuous development of multimedia information technology, a large number of video information has emerged. As a comprehensive medium for expressing information, video data has become an important information carrier in our real life. With the increasing popularity of surveillance imaging equipment, massive amounts of video data have been generated, and video storage is a very important link in the application of video surveillance systems. Large amounts of video data usually need to be stored for a long time to provide services for the retrieval and playback of video recording materials in the future.
最常见的存储视频数据的方法是简单地对全时段的视频原始资料进行存储。这种方式保存了大量背景信息,即保存的信息中存在很大冗余,耗费了庞大的存储成本,难以保存长时期的视频记录。因此必须将视频原始资料进行压缩存储,缩小存储空间,延长存储时效。但随着视频压缩技术的发展,出现了很多视频压缩的算法与技术,例如MPEG压缩算法,其优点是解码之后的画质比较清晰,但是该算法压缩率低,占用宽带很大;H.264压缩技术的压缩率高,但解码之后的画质相对较差,占用带宽随画面运动的复杂度而大幅变化。The most common way to store video data is to simply store the original video data for the entire period. This method saves a lot of background information, that is, there is a lot of redundancy in the saved information, which consumes huge storage costs, and it is difficult to save long-term video records. Therefore, the original video data must be compressed and stored to reduce storage space and extend storage time. However, with the development of video compression technology, many video compression algorithms and technologies have emerged, such as the MPEG compression algorithm. Its advantage is that the image quality after decoding is clearer, but the algorithm has a low compression rate and takes up a lot of bandwidth; H.264 The compression rate of the compression technology is high, but the picture quality after decoding is relatively poor, and the occupied bandwidth varies greatly with the complexity of the picture motion.
技术问题technical problem
本申请实施例的目的之一在于:提供一种视频压缩方法、装置及终端设备, 旨在解决视频压缩性能低的问题。One of the objectives of the embodiments of the present application is to provide a video compression method, device, and terminal equipment, aiming to solve the problem of low video compression performance.
技术解决方案Technical solutions
为解决上述技术问题,本申请实施例采用的技术方案是:In order to solve the above technical problems, the technical solutions adopted in the embodiments of this application are:
第一方面,提供了一种视频压缩方法,包括:In the first aspect, a video compression method is provided, including:
将待压缩视频表示为一个包含N帧的帧序列,其中N为正整数;Express the video to be compressed as a frame sequence containing N frames, where N is a positive integer;
分别对所述帧序列中的每一帧构建对应的压缩网络;Respectively construct a corresponding compression network for each frame in the frame sequence;
根据第m-1帧重构帧训练所述第m帧对应的压缩网络,并根据所述训练后的第m帧对应的压缩网络得到第m帧重构帧,同时存储所述第m帧对应的压缩网络的权重参数,其中1≤m≤N,且m为正整数;Train the compression network corresponding to the m-th frame according to the m-1th frame reconstruction frame, and obtain the m-th frame reconstruction frame according to the compression network corresponding to the m-th frame after the training, and store the corresponding m-th frame The weight parameter of the compression network, where 1≤m≤N, and m is a positive integer;
将所述存储的每一帧对应的压缩网络的权重参数按所述帧序列的顺序构成压缩视频。The weight parameters of the compression network corresponding to each of the stored frames are formed into a compressed video in the order of the frame sequence.
第二方面,提供了一种视频压缩装置,包括:In a second aspect, a video compression device is provided, including:
帧序列模块,用于将待压缩视频表示为一个包含N帧的帧序列,其中N为正整数;The frame sequence module is used to represent the video to be compressed as a frame sequence containing N frames, where N is a positive integer;
网络构建模块,用于分别对所述帧序列中的每一帧构建对应的压缩网络;The network construction module is used to construct a corresponding compression network for each frame in the frame sequence;
重构模块,用于根据第m-1帧重构帧训练所述第m帧对应的压缩网络,并根据所述训练后的第m帧对应的压缩网络得到第m帧重构帧,同时存储所述第m帧对应的压缩网络的权重参数,其中1≤m≤N,且m为正整数;The reconstruction module is used to train the compression network corresponding to the m-th frame according to the m-1th frame reconstruction frame, and obtain the m-th frame reconstruction frame according to the compression network corresponding to the m-th frame after the training, and store it at the same time The weight parameter of the compression network corresponding to the mth frame, where 1≤m≤N, and m is a positive integer;
压缩视频构成模块,用于将所述存储的每一帧对应的压缩网络的权重参数按所述帧序列的顺序构成压缩视频。The compressed video forming module is used for forming the compressed video according to the order of the frame sequence by the weight parameters of the compression network corresponding to each of the stored frames.
第三方面,提供一种视频压缩终端设备,包括存储器、处理器以及存储在上述存储器中并可在上述处理器上运行的计算机程序,上述处理器执行上述计算机程序时实现如上第一方面所提供的方法的步骤。In a third aspect, a video compression terminal device is provided, including a memory, a processor, and a computer program stored in the above-mentioned memory and running on the above-mentioned processor. When the above-mentioned processor executes the above-mentioned computer program, the above-mentioned computer program is implemented Steps of the method.
本申请实施例提供的视频压缩方法的有益效果在于:本实施例通过构建一个新的视频压缩框架:先构建每一帧对应的压缩网络,接着通过前一帧的重构帧训练下一帧的压缩网络并对下一帧进行重构。通过对视频的先验信息进行学习,从而提高了视频压缩性能:在相同的重建质量条件下,压缩率更高;在相同的压缩率下,重建质量更好。The beneficial effect of the video compression method provided by the embodiments of this application is that: this embodiment constructs a new video compression framework: first constructs a compression network corresponding to each frame, and then trains the next frame through the reconstructed frame of the previous frame. Compress the network and reconstruct the next frame. By learning the prior information of the video, the video compression performance is improved: under the same reconstruction quality condition, the compression rate is higher; under the same compression rate, the reconstruction quality is better.
本申请实施例提供的视频压缩装置的有益效果在于:本实施例通过网络构建模块构建每一帧对应的压缩网络,接着通过重构模块根据前一帧的重构帧训练下一帧的压缩网络并对下一帧进行重构。通过对视频的先验信息进行学习,从而提高了视频压缩性能:在相同的重建质量条件下,压缩率更高;在相同的压缩率下,重建质量更好。The beneficial effects of the video compression device provided in the embodiments of the present application are: in this embodiment, a compression network corresponding to each frame is constructed through the network construction module, and then the compression network of the next frame is trained by the reconstruction module according to the reconstructed frame of the previous frame And reconstruct the next frame. By learning the prior information of the video, the video compression performance is improved: under the same reconstruction quality condition, the compression rate is higher; under the same compression rate, the reconstruction quality is better.
本申请实施例提供的视频压缩终端设备的有益效果在于:本实施例通过构建一个新的视频压缩框架:先构建每一帧对应的压缩网络,接着通过前一帧的重构帧训练下一帧的压缩网络并对下一帧进行重构。通过对视频的先验信息进行学习,从而提高了视频压缩性能:在相同的重建质量条件下,压缩率更高;在相同的压缩率下,重建质量更好。The beneficial effect of the video compression terminal device provided by the embodiment of this application is that: this embodiment constructs a new video compression framework: first constructs a compression network corresponding to each frame, and then trains the next frame through the reconstructed frame of the previous frame Compress the network and reconstruct the next frame. By learning the prior information of the video, the video compression performance is improved: under the same reconstruction quality condition, the compression rate is higher; under the same compression rate, the reconstruction quality is better.
附图说明Description of the drawings
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例或示范性技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。In order to more clearly describe the technical solutions in the embodiments of the present application, the following will briefly introduce the drawings that need to be used in the embodiments or exemplary technical descriptions. Obviously, the drawings in the following description are only for the present application. For some embodiments, those of ordinary skill in the art can obtain other drawings based on these drawings without creative work.
图1是本申请一实施例提供的视频压缩方法的实现流程示意图;FIG. 1 is a schematic diagram of the implementation process of a video compression method provided by an embodiment of the present application;
图2是本申请另一实施例提供的训练压缩网络的实现流程示意图;Figure 2 is a schematic diagram of the implementation process of training a compression network provided by another embodiment of the present application;
图3是本申请另一实施例提供的压缩网络结构的示意图;FIG. 3 is a schematic diagram of a compressed network structure provided by another embodiment of the present application;
图4是本申请另一实施例提供的生成重构帧的实现流程示意图;FIG. 4 is a schematic diagram of an implementation process of generating a reconstructed frame provided by another embodiment of the present application;
图5是本申请另一实施例提供的视频压缩装置的示意图;FIG. 5 is a schematic diagram of a video compression device provided by another embodiment of the present application;
图6是本申请另一实施例提供的视频压缩终端设备的示意图。Fig. 6 is a schematic diagram of a video compression terminal device provided by another embodiment of the present application.
本发明的实施方式Embodiments of the invention
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本发明,并不用于限定本申请。In order to make the purpose, technical solutions, and advantages of this application clearer, the following further describes this application in detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, and are not used to limit the present application.
需说明的是,当部件被称为“固定于”或“设置于”另一个部件,它可以直接在另一个部件上或者间接在该另一个部件上。当一个部件被称为是“连接于”另一个部件,它可以是直接或者间接连接至该另一个部件上。术语“上”、“下”、“左”、“右”等指示的方位或位置关系为基于附图所示的方位或位置关系,仅是为了便于描述,而不是指示或暗示所指的装置或元件必须具有特定的方位、以特定的方位构造和操作,因此不能理解为对本申请的限制,对于本领域的普通技术人员而言,可以根据具体情况理解上述术语的具体含义。术语“第一”、“第二”仅用于便于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明技术特征的数量。“多个”的含义是两个或两个以上,除非另有明确具体的限定。It should be noted that when a component is referred to as being "fixed to" or "installed on" another component, it can be directly on the other component or indirectly on the other component. When a component is said to be "connected" to another component, it can be directly or indirectly connected to the other component. The terms "upper", "lower", "left", "right", etc. indicate the orientation or positional relationship based on the orientation or positional relationship shown in the drawings, and are only for ease of description, and do not indicate or imply the device referred to. Or the element must have a specific orientation, be constructed and operated in a specific orientation, and therefore cannot be construed as a limitation of the present application. For those of ordinary skill in the art, the specific meaning of the above terms can be understood according to specific conditions. The terms "first" and "second" are only used for ease of description, and cannot be understood as indicating or implying relative importance or implicitly indicating the number of technical features. The meaning of "plurality" means two or more than two, unless otherwise specifically defined.
为了说明本申请所述的技术方案,以下结合具体附图及实施例进行详细说明。In order to illustrate the technical solutions described in this application, detailed descriptions are given below in conjunction with specific drawings and embodiments.
实施例一Example one
图1示出了本发明实施例一提供的视频压缩方法的实现流程,该方法的执行主体可以是终端设备,详述如下:Fig. 1 shows the implementation process of the video compression method provided in the first embodiment of the present invention. The execution subject of the method may be a terminal device. The details are as follows:
步骤S101,将待压缩视频表示为一个包含N帧的帧序列,其中N为正整 数。In step S101, the video to be compressed is represented as a frame sequence containing N frames, where N is a positive integer.
可选地,当连续的图像变化每秒超过24帧画面以上时,根据视觉暂留原理,人眼无法辨别单幅的静态画面,看上去是平滑连续的视觉效果,这样连续的画面叫做视频。所以视频是由若干帧画面构成的,即视频为一个包含N帧的帧序列,其中N为正整数。Optionally, when the continuous image changes more than 24 frames per second, according to the principle of persistence of vision, human eyes cannot distinguish a single static image, and it looks like a smooth and continuous visual effect. Such continuous images are called videos. So the video is composed of several frames, that is, the video is a frame sequence containing N frames, where N is a positive integer.
步骤S102,分别对上述帧序列中的每一帧构建对应的压缩网络。In step S102, a corresponding compression network is constructed for each frame in the aforementioned frame sequence.
可选地,对上述帧序列中的每一帧进行压缩网络的构建,其中上述压缩网络可以是卷积神经网络。可选地,构建压缩网络可以是人工构建的网络,也可以采用网络搜索的方式搭建一个网络,也可以是两者的结合,此处不做限定。Optionally, a compression network is constructed for each frame in the aforementioned frame sequence, where the aforementioned compression network may be a convolutional neural network. Optionally, the construction of the compressed network may be a manually constructed network, or a network search method may be used to construct a network, or it may be a combination of the two, which is not limited here.
可选地,对于网络中的训练参数(如学习率、批处理参数、权值衰减等),可以采用随机搜索(Random search)、网格搜索(Grid search)、贝叶斯优化(Bayesian optimization)、强化学习(Reinforcement learning)、进化算法(Evolutionary Algorithm)等Hyperparameter optimization(HO)框架来进行设置。对于定义网络结构的参数(如网络的层数、每层的算子、卷积中的滤波器尺寸等),通过网络架构搜索(Neural Architecture Search,NAS)来进行调优,应理解,此处仅针对构建网络时调参的一些方法进行举例说明,不应对构建网络的过程构成任何限定。Optionally, for training parameters in the network (such as learning rate, batch processing parameters, weight decay, etc.), random search, grid search, and Bayesian optimization can be used. , Reinforcement learning (Reinforcement learning), Evolutionary algorithm (Evolutionary Algorithm) and other Hyperparameter optimization (HO) frameworks for setting. For the parameters that define the network structure (such as the number of layers of the network, the operator of each layer, the size of the filter in the convolution, etc.), the network architecture search (Neural Architecture Search, NAS) is used for tuning. It should be understood that here Only some methods of adjusting parameters when constructing a network are given as examples, and the process of constructing a network should not be any limitation.
步骤S103,根据第m-1帧重构帧训练上述第m帧对应的压缩网络,并根据上述训练后的第m帧对应的压缩网络得到第m帧重构帧,同时存储上述第m帧对应的压缩网络的权重参数,其中1≤m≤N,且m为正整数。Step S103: Train the compression network corresponding to the m-th frame according to the m-1th frame reconstruction frame, and obtain the m-th frame reconstruction frame according to the compression network corresponding to the m-th frame after the training, and store the corresponding m-th frame The weight parameter of the compression network, where 1≤m≤N, and m is a positive integer.
其中,当m=1时,上述第m-1帧则为预先输入的数据矩阵。即当m-1=0的时候,由于帧序列中并不存在第0帧,所以需要预先输入一个数据矩阵。可选地,上述数据矩阵可以是随机生成的噪声矩阵,也可以是值全为1的矩阵, 此处对数据矩阵不作限定。Among them, when m=1, the m-1th frame is the pre-input data matrix. That is, when m-1=0, since the 0th frame does not exist in the frame sequence, a data matrix needs to be input in advance. Optionally, the aforementioned data matrix may be a randomly generated noise matrix, or a matrix whose values are all 1, and the data matrix is not limited here.
可选地,图2示出了训练压缩网络的流程图,上述根据第m-1帧重构帧训练上述第m帧对应的压缩网络包括:Optionally, FIG. 2 shows a flowchart of training a compression network. The training of the compression network corresponding to the m-th frame according to the reconstructed frame of the m-1th frame includes:
S201:将上述第m-1帧重构帧作为上述第m帧对应的压缩网络的输入,得到第m帧重建图;S201: Use the m-1th frame reconstructed frame as the input of the compression network corresponding to the mth frame to obtain the mth frame reconstructed image;
可选地,将上述第m-1帧重构帧输入第m帧对应的压缩网络中,上述压缩网络可以为卷积神经网络,该卷积神经网络可以包括至少一个卷积层。可选地,上述卷积层可包括卷积核,输入卷积层的图像经过与卷积核的卷积运算后去除冗余的图像信息,输出包含特征信息的图像。如果上述卷积核的尺寸大于1×1,则卷积层可以输出多幅尺寸小于输入图像的特征图,在经过多个卷积层的处理后,输入卷积神经网络的图像的尺寸经过了多级收缩,得到多幅尺寸小于输入神经网络的图像尺寸的特征图。可选地,在本发明实施例中,将第m-1帧重构帧输入到压缩网络生成对应的第m帧重建图可以是反卷积操作,反卷积操作则与上述描述的输入图像去除冗余信息生成特征图像的过程相反。Optionally, the m-1th frame reconstructed frame is input into a compression network corresponding to the mth frame. The compression network may be a convolutional neural network, and the convolutional neural network may include at least one convolutional layer. Optionally, the above-mentioned convolutional layer may include a convolution kernel, and the image input to the convolutional layer undergoes a convolution operation with the convolution kernel to remove redundant image information, and output an image containing feature information. If the size of the above convolution kernel is greater than 1×1, the convolutional layer can output multiple feature maps whose size is smaller than the input image. After multiple convolutional layers are processed, the size of the image input to the convolutional neural network is Multi-level shrinking, to obtain multiple feature maps whose size is smaller than the image size of the input neural network. Optionally, in the embodiment of the present invention, inputting the reconstructed frame of the m-1th frame into the compression network to generate the corresponding reconstructed image of the mth frame may be a deconvolution operation, and the deconvolution operation is the same as the input image described above. The process of removing redundant information to generate feature images is reversed.
示例地,如图3所示,压缩网络可以是卷积神经网络,包括四个卷积或反卷积层,四个卷积或反卷积层对应的权重参数矩阵的尺寸分别为H 1*W 1*N 1*C 1,H 2*W 2*N 2*C 2,H 3*W 3*N 3*C 3,H 4*W 4*N 4*C 4,步长分别为S 1、S 2、S 3、S 4。其中,H为权重参数矩阵的高,W为权重参数矩阵的宽,N为权重参数矩阵的输出通道数,C为权重参数矩阵的输入通道数。且四个卷积或反卷积层的权重参数的输入通道数分别与前一层的权重参数的输出通道数相关联。示例地,在图3中,C 1=C,C 2=N 1,C 3=N 2,C 4=N 3。假设输入的第m-1帧重构帧为1*H*W*C的矩阵,经过上述卷积神经网络后,对应生成尺寸为1*H’*W’*C’的输出矩阵,其中H’=H*S 1*S 2*S 3*S 4,W’=W*S 1*S 2*S 3*S 4,C’=C 4。 此处仅针对压缩网络的其中一种情况进行示例说明,不应当对压缩网络的结构构成限定。 For example, as shown in Figure 3, the compression network may be a convolutional neural network, including four convolution or deconvolution layers, and the size of the weight parameter matrix corresponding to the four convolution or deconvolution layers is H 1 * W 1 *N 1 *C 1 , H 2 *W 2 *N 2 *C 2 , H 3 *W 3 *N 3 *C 3 , H 4 *W 4 *N 4 *C 4 , the steps are respectively S 1 , S 2 , S 3 , S 4 . Among them, H is the height of the weight parameter matrix, W is the width of the weight parameter matrix, N is the number of output channels of the weight parameter matrix, and C is the number of input channels of the weight parameter matrix. And the number of input channels of the weight parameters of the four convolution or deconvolution layers is respectively associated with the number of output channels of the weight parameters of the previous layer. Illustratively, in Fig. 3, C 1 =C, C 2 =N 1 , C 3 =N 2 , and C 4 =N 3 . Assuming that the input m-1th frame reconstruction frame is a matrix of 1*H*W*C, after the above convolutional neural network, an output matrix of size 1*H'*W'*C' is generated correspondingly, where H '=H*S 1 *S 2 *S 3 *S 4 , W'=W*S 1 *S 2 *S 3 *S 4 , C'=C 4 . Here is only an example of one of the compression networks, and should not limit the structure of the compression network.
S202:计算上述第m帧重建图与第m帧的损失函数,并根据上述损失函数进行梯度更新,调整上述第m帧对应的压缩网络的权重参数;S202: Calculate the loss function of the m-th frame reconstruction image and the m-th frame, perform gradient update according to the above-mentioned loss function, and adjust the weight parameter of the compression network corresponding to the m-th frame;
可选地,第m帧重建图与第m帧之间的损失函数可以使用MSE(均方误差)。具体的,MSE的公式如公式(1)所示:Optionally, the loss function between the reconstructed image of the mth frame and the mth frame may use MSE (Mean Square Error). Specifically, the formula of MSE is shown in formula (1):
Figure PCTCN2019114953-appb-000001
Figure PCTCN2019114953-appb-000001
其中,H为第m帧重建图的高,W为第m帧重建图的宽,C为第m帧重建图通道数,X’代表第m帧重建图,X代表第m帧,X’ i,j,k代表第m帧重建图中第k通道第i行第j列的数值,X i,j,k代表第m帧中第k通道第i行第j列的数值。 Among them, H is the height of the m- th frame reconstruction image, W is the width of the m- th frame reconstruction image, C is the channel number of the m- th frame reconstruction image, X'represents the m- th frame reconstruction image, X represents the m- th frame, X'i ,j,k represent the value of the i-th row and j-th column of the k-th channel in the m- th frame reconstruction image, and Xi ,j,k represent the value of the i-th row and jth column of the k-th channel in the m- th frame.
可选地,梯度更新的公式如公式(2)所示:Optionally, the gradient update formula is shown in formula (2):
W′=W-αΔW        (2)W′=W-αΔW (2)
其中,W代表网络的权重参数,W’代表更新后的权重参数,α是预先设定的学习率,ΔW是计算梯度。Among them, W represents the weight parameter of the network, W'represents the weight parameter after update, α is the preset learning rate, and ΔW is the calculated gradient.
可选地,在进行梯度更新的时候,可以使用现有的自适应梯度优化器来进行计算。具体地,可以使用Adam优化器。可选地,在Adam优化器中输入上述MSE计算结果、网络的权重参数、预先设定的学习率,即可得到更新后的权重参数。Optionally, when performing gradient update, an existing adaptive gradient optimizer can be used for calculation. Specifically, the Adam optimizer can be used. Optionally, input the above-mentioned MSE calculation result, the weight parameter of the network, and the preset learning rate in the Adam optimizer to obtain the updated weight parameter.
可选地,将上述计算得到的更新后的权重参数替换掉压缩网络中原有的权重参数,成为新的第m帧对应的压缩网络。Optionally, replace the original weight parameter in the compression network with the updated weight parameter obtained by the foregoing calculation to become the compression network corresponding to the new m-th frame.
S203:重复执行S201到S202直至上述第m帧对应的压缩网络满足预设条件为止。S203: Repeat S201 to S202 until the compression network corresponding to the mth frame meets the preset condition.
可选地,上述重复执行S201到S202直至上述第m帧对应的压缩网络满足预设条件为止包括:Optionally, repeating S201 to S202 until the compression network corresponding to the mth frame meets a preset condition includes:
重复执行S201到S202直至上述第m帧对应的压缩网络达到预设的重建质量为止Repeat S201 to S202 until the compression network corresponding to the mth frame above reaches the preset reconstruction quality
or
重复执行S201到S202次数达到预设次数为止。Repeat S201 to S202 until the preset number of times is reached.
可选地,重复执行S201到S202的次数达到预设次数为止,其中预设次数为人工预先设置在视频压缩程序中或者预先设置在装载视频压缩程序的终端设备中。Optionally, the number of repeated executions of S201 to S202 reaches a preset number of times, where the preset number is manually preset in the video compression program or preset in the terminal device that loads the video compression program.
可选地,重复执行S201到S202直至第m帧对应的压缩网络达到预设的重建质量为止。其中,上述第m帧对应的压缩网络的重建质量可以使用峰值信噪比PSNR(Peak Signal to Noise Ratio)和像素比特BPP(bits per pixel)来表示。Optionally, S201 to S202 are repeatedly executed until the compression network corresponding to the mth frame reaches a preset reconstruction quality. Among them, the reconstruction quality of the compression network corresponding to the m-th frame can be represented by a peak signal-to-noise ratio (PSNR) (Peak Signal to Noise Ratio) and a pixel bit (BPP) (bits per pixel).
具体地,将测试图集放入到上述第m帧对应的压缩网络中测试上述第m帧对应的压缩网络的重建质量,可以用峰值信噪比PSNR和像素比特BPP来表示。可选地,在固定的像素比特BPP下,判断峰值信噪比PSNR是否达到预设阈值,峰值信噪比PSNR越高则代表帧在压缩中损失的信息越少。可选地,上述测试图集可以包括24张柯达标准测试图集,此处不作限定。Specifically, the test atlas is put into the compression network corresponding to the mth frame to test the reconstruction quality of the compression network corresponding to the mth frame, which can be represented by the peak signal-to-noise ratio PSNR and the pixel bit BPP. Optionally, under a fixed pixel bit BPP, it is determined whether the peak signal-to-noise ratio (PSNR) reaches a preset threshold. The higher the peak signal-to-noise ratio (PSNR), the less information is lost in the frame during compression. Optionally, the aforementioned test atlas may include 24 Kodak standard test atlases, which are not limited here.
图4是本发明实施例提供的生成重构帧的实现流程示意图Figure 4 is a schematic diagram of the implementation process of generating a reconstructed frame provided by an embodiment of the present invention
S401:将上述第m帧对应的压缩网络的权重参数作为压缩特征;S401: Use the weight parameter of the compression network corresponding to the mth frame as the compression feature;
可选地,提取第m帧对应的压缩网络的权重参数,并将第m帧对应的压缩网络的权重参数作为压缩特征。Optionally, the weight parameter of the compression network corresponding to the m-th frame is extracted, and the weight parameter of the compression network corresponding to the m-th frame is used as the compression feature.
S402:将上述压缩特征通过编码得到编码数据;S402: Obtain coded data by encoding the compression feature;
其中,上述编码可以是香农(Shannon)编码、哈夫曼(Huffman)编码或者算术编码(arithmetic coding)等熵编码方案,此处不作限定。The foregoing coding may be an entropy coding scheme such as Shannon coding, Huffman coding, or arithmetic coding, which is not limited here.
S403:将上述编码数据通过解码生成重构权重参数;S403: Decode the aforementioned encoded data to generate reconstruction weight parameters;
S404:根据上述重构权重参数初始化上述第m帧对应的压缩网络;S404: Initialize the compression network corresponding to the mth frame according to the reconstruction weight parameter;
可选地,将步骤S403中生成的重构权重参数替换掉第m帧对应的压缩网络中的权重参数,构成新的第m帧对应的压缩网络。Optionally, the reconstruction weight parameter generated in step S403 is replaced with the weight parameter in the compression network corresponding to the m-th frame to form a new compression network corresponding to the m-th frame.
S405:将上述第m-1帧输入上述重构权重参数更新后的第m帧对应的压缩网络中,得到第m帧重构帧。S405: Input the m-1th frame into the compression network corresponding to the mth frame after the reconstruction weight parameter is updated, to obtain the mth frame reconstructed frame.
步骤S104,将上述存储的每一帧对应的压缩网络的权重参数按上述帧序列的顺序构成压缩视频。In step S104, the weight parameters of the compression network corresponding to each of the above-mentioned stored frames are formed into a compressed video in the order of the above-mentioned frame sequence.
可选地,在步骤S103中对于每一帧对应的压缩网络的权重参数进行了存储,按照上述帧序列的顺序,对应的每一帧的压缩网络的权重参数的合集即为压缩视频。Optionally, in step S103, the weight parameters of the compression network corresponding to each frame are stored. According to the sequence of the above-mentioned frame sequence, the collection of the weight parameters of the compression network corresponding to each frame is the compressed video.
本实施例中,通过构建一个新的视频压缩框架:先构建每一帧对应的压缩网络,接着通过前一帧的重构帧训练下一帧的压缩网络并对下一帧进行重构。通过对视频的先验信息进行学习,从而提高了视频压缩性能:在相同的重建质量条件下,压缩率更高;在相同的压缩率下,重建质量更好。In this embodiment, a new video compression framework is constructed: first, a compression network corresponding to each frame is constructed, and then the compression network of the next frame is trained through the reconstruction frame of the previous frame and the next frame is reconstructed. By learning the prior information of the video, the video compression performance is improved: under the same reconstruction quality condition, the compression rate is higher; under the same compression rate, the reconstruction quality is better.
应理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本发明实施例的实施过程构成任何限定。It should be understood that the size of the sequence number of each step in the foregoing embodiment does not mean the order of execution, and the execution sequence of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiment of the present invention.
实施例二Example two
图5示出了本发明实施例提供的视频压缩装置的结构框图,为了便于说明,仅示出了与本发明实施例相关的部分。该视频压缩装置5包括:帧序列模块51,网络构建模块52,重构模块53,压缩视频构成模块54。FIG. 5 shows a structural block diagram of a video compression device provided by an embodiment of the present invention. For ease of description, only parts related to the embodiment of the present invention are shown. The video compression device 5 includes: a frame sequence module 51, a network construction module 52, a reconstruction module 53, and a compressed video construction module 54.
其中,上述帧序列模块51用于将待压缩视频表示为一个包含N帧的帧序列,其中N为正整数;Wherein, the aforementioned frame sequence module 51 is used to represent the video to be compressed as a frame sequence containing N frames, where N is a positive integer;
网络构建模块52用于分别对上述帧序列中的每一帧构建对应的压缩网络;The network construction module 52 is used to construct a corresponding compression network for each frame in the above frame sequence;
重构模块53用于根据第m-1帧重构帧训练上述第m帧对应的压缩网络,并根据上述训练后的第m帧对应的压缩网络得到第m帧重构帧,同时存储上述第m帧对应的压缩网络的权重参数,其中1≤m≤N,且m为正整数;The reconstruction module 53 is used to train the compression network corresponding to the m-th frame according to the m-1th frame reconstruction frame, and obtain the m-th frame reconstruction frame according to the compression network corresponding to the m-th frame after the training, and store the above-mentioned The weight parameter of the compression network corresponding to the m frame, where 1≤m≤N, and m is a positive integer;
压缩视频构成模块54用于将上述存储的每一帧对应的压缩网络的权重参数按上述帧序列的顺序构成压缩视频。The compressed video composing module 54 is used for composing the compressed video in the order of the aforementioned frame sequence by the weight parameters of the compression network corresponding to each of the aforementioned stored frames.
可选地,上述重构53模块包括:Optionally, the above-mentioned reconstruction 53 module includes:
重建单元,用于将上述第m-1帧重构帧作为上述第m帧对应的压缩网络的输入,得到第m帧重建图;A reconstruction unit, configured to use the m-1th frame reconstructed frame as the input of the compression network corresponding to the mth frame to obtain the mth frame reconstructed image;
参数调整单元,用于计算上述第m帧重建图与第m帧的损失函数,并根据上述损失函数进行梯度更新,调整上述第m帧对应的压缩网络的权重参数;A parameter adjustment unit, configured to calculate the loss function of the m-th frame reconstruction image and the m-th frame, perform gradient update according to the above-mentioned loss function, and adjust the weight parameters of the compression network corresponding to the m-th frame;
重复执行单元,用于重复执行重建单元到参数调整单元直至上述第m帧对应的压缩网络满足预设条件为止。The repeated execution unit is configured to repeatedly execute the reconstruction unit to the parameter adjustment unit until the compression network corresponding to the mth frame meets the preset condition.
可选地,上述重复执行重建单元到参数调整单元直至上述第m帧对应的压缩网络满足预设条件为止包括:Optionally, the repeated execution of the reconstruction unit to the parameter adjustment unit until the compression network corresponding to the m-th frame meets a preset condition includes:
重复执行重建单元到参数调整单元直至上述第m帧对应的压缩网络达到预设的重建质量为止Repeat the reconstruction unit to the parameter adjustment unit until the compression network corresponding to the m-th frame reaches the preset reconstruction quality
or
重复执行重建单元到参数调整单元的次数达到预设次数为止。The reconstruction unit is repeatedly executed until the number of parameter adjustment units reaches the preset number.
可选地,上述重构模块53还包括:Optionally, the foregoing reconstruction module 53 further includes:
参数提取单元,用于将上述第m帧对应的压缩网络的权重参数作为压缩特征;A parameter extraction unit, configured to use the weight parameter of the compression network corresponding to the mth frame as the compression feature;
编码单元,用于将上述压缩特征通过编码得到编码数据;The coding unit is used to obtain coded data by coding the aforementioned compression features;
解码单元,用于将上述编码数据通过解码生成重构权重参数;A decoding unit, configured to decode the aforementioned encoded data to generate reconstruction weight parameters;
初始化单元,用于根据上述重构权重参数初始化上述第m帧对应的压缩网络;An initialization unit, configured to initialize the compression network corresponding to the mth frame according to the reconstruction weight parameter;
重构帧生成单元,用于将上述第m-1帧输入上述重构权重参数更新后的第m帧对应的压缩网络中,得到第m帧重构帧。The reconstructed frame generating unit is configured to input the m-1th frame into the compression network corresponding to the mth frame after the reconstruction weight parameter is updated to obtain the mth frame reconstructed frame.
可选地,当m=1时,上述第m-1帧为预先输入的数据矩阵。Optionally, when m=1, the m-1th frame is a pre-input data matrix.
本实施例通过网络构建模块52构建每一帧对应的压缩网络,接着通过重构模块53根据前一帧的重构帧训练下一帧的压缩网络并对下一帧进行重构。通过对视频的先验信息进行学习,从而提高了视频压缩性能:在相同的重建质量条件下,压缩率更高;在相同的压缩率下,重建质量更好。In this embodiment, the network construction module 52 constructs the compression network corresponding to each frame, and then the reconstruction module 53 trains the compression network of the next frame according to the reconstructed frame of the previous frame and reconstructs the next frame. By learning the prior information of the video, the video compression performance is improved: under the same reconstruction quality condition, the compression rate is higher; under the same compression rate, the reconstruction quality is better.
实施例三Example three
图6是本发明一实施例提供的视频压缩终端设备的示意图。如图6所示,该实施例的视频压缩终端设备6包括:处理器60、存储器61以及存储在上述存储器61中并可在上述处理器60上运行的计算机程序62,例如视频压缩程 序。上述处理器60执行上述计算机程序62时实现上述各个视频压缩方法实施例中的步骤,例如图1所示的步骤101至104。或者,上述处理器60执行上述计算机程序62时实现上述各装置实施例中各模块/单元的功能,例如图5所示模块51至54的功能。Fig. 6 is a schematic diagram of a video compression terminal device provided by an embodiment of the present invention. As shown in Fig. 6, the video compression terminal device 6 of this embodiment includes: a processor 60, a memory 61, and a computer program 62 stored in the memory 61 and running on the processor 60, such as a video compression program. When the processor 60 executes the computer program 62, the steps in the foregoing embodiments of the video compression method, such as steps 101 to 104 shown in FIG. 1, are implemented. Alternatively, when the processor 60 executes the computer program 62, the functions of the modules/units in the foregoing device embodiments, such as the functions of the modules 51 to 54 shown in FIG. 5, are realized.
示例性的,上述计算机程序62可以被分割成一个或多个模块/单元,上述一个或者多个模块/单元被存储在上述存储器61中,并由上述处理器60执行,以完成本发明。上述一个或多个模块/单元可以是能够完成特定功能的一系列计算机程序指令段,该指令段用于描述上述计算机程序62在上述视频压缩终端设备6中的执行过程。例如,上述计算机程序62可以被分割成帧序列模块,网络构建模块,重构模块,压缩视频构成模块,各模块具体功能如下:Exemplarily, the foregoing computer program 62 may be divided into one or more modules/units, and the foregoing one or more modules/units are stored in the foregoing memory 61 and executed by the foregoing processor 60 to complete the present invention. The foregoing one or more modules/units may be a series of computer program instruction segments capable of completing specific functions, and the instruction segments are used to describe the execution process of the foregoing computer program 62 in the foregoing video compression terminal device 6. For example, the aforementioned computer program 62 can be divided into a frame sequence module, a network building module, a reconstruction module, and a compressed video building module. The specific functions of each module are as follows:
帧序列模块,用于将待压缩视频表示为一个包含N帧的帧序列,其中N为正整数;The frame sequence module is used to represent the video to be compressed as a frame sequence containing N frames, where N is a positive integer;
网络构建模块,用于分别对上述帧序列中的每一帧构建对应的压缩网络;The network construction module is used to construct a corresponding compression network for each frame in the above frame sequence;
重构模块,用于根据第m-1帧重构帧训练上述第m帧对应的压缩网络,并根据上述训练后的第m帧对应的压缩网络得到第m帧重构帧,同时存储上述第m帧对应的压缩网络的权重参数,其中1≤m≤N,且m为正整数;The reconstruction module is used to train the compression network corresponding to the m-th frame according to the m-1th frame reconstruction frame, and obtain the m-th frame reconstruction frame according to the compression network corresponding to the m-th frame after the training, and store the above-mentioned The weight parameter of the compression network corresponding to the m frame, where 1≤m≤N, and m is a positive integer;
压缩视频构成模块,用于将上述存储的每一帧对应的压缩网络的权重参数按上述帧序列的顺序构成压缩视频。The compressed video composition module is used to compose the compressed video according to the order of the frame sequence of the weight parameters of the compression network corresponding to each of the above-mentioned stored frames.
上述视频压缩终端设备6可以是桌上型计算机、笔记本、掌上电脑及云端服务器等计算设备。上述视频压缩终端设备可包括,但不仅限于,处理器60、存储器61。本领域技术人员可以理解,图6仅仅是视频压缩终端设备6的示例,并不构成对视频压缩终端设备6的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件,例如上述视频压缩终端设备还可以 包括输入输出设备、网络接入设备、总线等。The aforementioned video compression terminal device 6 may be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server. The foregoing video compression terminal device may include, but is not limited to, a processor 60 and a memory 61. Those skilled in the art can understand that FIG. 6 is only an example of the video compression terminal device 6 and does not constitute a limitation on the video compression terminal device 6. It may include more or less components than those shown in the figure, or combine certain components. Or different components, for example, the above-mentioned video compression terminal device may also include input and output devices, network access devices, buses, and so on.
所称处理器60可以是中央处理单元(Central Processing Unit,CPU),还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。The so-called processor 60 may be a central processing unit (Central Processing Unit, CPU), other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), Ready-made programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. The general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
上述存储器61可以是上述视频压缩终端设备6的内部存储单元,例如视频压缩终端设备6的硬盘或内存。上述存储器61也可以是上述视频压缩终端设备6的外部存储设备,例如上述视频压缩终端设备6上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。可选地,上述存储器61还可以既包括上述视频压缩终端设备6的内部存储单元也包括外部存储设备。上述存储器61用于存储上述计算机程序以及上述视频压缩终端设备所需的其他程序和数据。上述存储器61还可以用于暂时地存储已经输出或者将要输出的数据。The above-mentioned memory 61 may be an internal storage unit of the above-mentioned video compression terminal device 6, such as a hard disk or a memory of the video compression terminal device 6. The aforementioned memory 61 may also be an external storage device of the aforementioned video compression terminal device 6, such as a plug-in hard disk equipped on the aforementioned video compression terminal device 6, a Smart Media Card (SMC), and Secure Digital (SD) ) Card, Flash Card, etc. Optionally, the aforementioned memory 61 may also include both an internal storage unit of the aforementioned video compression terminal device 6 and an external storage device. The memory 61 is used to store the computer program and other programs and data required by the video compression terminal device. The aforementioned memory 61 may also be used to temporarily store data that has been output or will be output.
由上可见,本实施例通过构建一个新的视频压缩框架:先构建每一帧对应的压缩网络,接着通过前一帧的重构帧训练下一帧的压缩网络并对下一帧进行重构。通过对视频的先验信息进行学习,从而提高了视频压缩性能:在相同的重建质量条件下,压缩率更高;在相同的压缩率下,重建质量更好。As can be seen from the above, this embodiment constructs a new video compression framework: first constructs a compression network corresponding to each frame, and then trains the compression network of the next frame through the reconstruction frame of the previous frame and reconstructs the next frame . By learning the prior information of the video, the video compression performance is improved: under the same reconstruction quality condition, the compression rate is higher; under the same compression rate, the reconstruction quality is better.
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各功能单元、模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能单元、模块完成,即将上述装置的内部结构划分成不同的功能单元或模块,以完成以上描述的全部或者部分功能。实施例中的各功 能单元、模块可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中,上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。另外,各功能单元、模块的具体名称也只是为了便于相互区分,并不用于限制本申请的保护范围。上述系统中单元、模块的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that for the convenience and conciseness of description, only the division of the above-mentioned functional units and modules is used as an example. In practical applications, the above-mentioned functions can be allocated to different functional units and modules as required. Module completion, that is, divide the internal structure of the above device into different functional units or modules to complete all or part of the functions described above. The functional units and modules in the embodiments can be integrated into one processing unit, or each unit can exist alone physically, or two or more units can be integrated into one unit. The above-mentioned integrated units can be hardware-based Form realization can also be realized in the form of software functional unit. In addition, the specific names of the functional units and modules are only for the convenience of distinguishing each other, and are not used to limit the protection scope of the present application. For the specific working process of the units and modules in the foregoing system, reference may be made to the corresponding process in the foregoing method embodiment, which is not repeated here.
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述或记载的部分,可以参见其它实施例的相关描述。In the above-mentioned embodiments, the description of each embodiment has its own emphasis. For parts that are not described in detail or recorded in an embodiment, reference may be made to related descriptions of other embodiments.
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明的范围。A person of ordinary skill in the art may be aware that the units and algorithm steps of the examples described in combination with the embodiments disclosed herein can be implemented by electronic hardware or a combination of computer software and electronic hardware. Whether these functions are executed by hardware or software depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered as going beyond the scope of the present invention.
在本发明所提供的实施例中,应该理解到,所揭露的装置/终端设备和方法,可以通过其它的方式实现。例如,以上所描述的装置/终端设备实施例仅仅是示意性的,例如,上述模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通讯连接可以是通过一些接口,装置或单元的间接耦合或通讯连接,可以是电性,机械或其它的形式。In the embodiments provided by the present invention, it should be understood that the disclosed device/terminal device and method may be implemented in other ways. For example, the device/terminal device embodiments described above are merely illustrative. For example, the division of the above-mentioned modules or units is only a logical function division. In actual implementation, there may be other division methods, such as multiple units or Components can be combined or integrated into another system, or some features can be omitted or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
上述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部 单元来实现本实施例方案的目的。The units described above as separate components may or may not be physically separate, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, the functional units in the various embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.
上述集成的模块/单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明实现上述实施例方法中的全部或部分流程,也可以通过计算机程序来指令相关的硬件来完成,上述的计算机程序可存储于一计算机可读存储介质中,该计算机程序在被处理器执行时,可实现上述各个方法实施例的步骤。其中,上述计算机程序包括计算机程序代码,上述计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。上述计算机可读介质可以包括:能够携带上述计算机程序代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、电载波信号、电信信号以及软件分发介质等。需要说明的是,上述计算机可读介质包含的内容可以根据司法管辖区内立法和专利实践的要求进行适当的增减,例如在某些司法管辖区,根据立法和专利实践,计算机可读介质不包括是电载波信号和电信信号。If the above integrated modules/units are implemented in the form of software functional units and sold or used as independent products, they can be stored in a computer readable storage medium. Based on this understanding, the present invention implements all or part of the processes in the above-mentioned embodiments and methods, and can also be completed by instructing relevant hardware through a computer program. The above-mentioned computer program can be stored in a computer-readable storage medium. When executed by the processor, the steps of the foregoing method embodiments can be implemented. Wherein, the above-mentioned computer program includes computer program code, and the above-mentioned computer program code may be in the form of source code, object code, executable file, or some intermediate forms. The above-mentioned computer-readable medium may include: any entity or device capable of carrying the above-mentioned computer program code, recording medium, U disk, mobile hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM, Read-Only Memory), random Access memory (RAM, Random Access Memory), electric carrier signal, telecommunications signal, software distribution medium, etc. It should be noted that the content contained in the above-mentioned computer-readable media can be appropriately added or deleted in accordance with the requirements of the legislation and patent practice in the jurisdiction. For example, in some jurisdictions, according to the legislation and patent practice, the computer-readable media cannot Including electrical carrier signals and telecommunication signals.
以上仅为本申请的可选实施例而已,并不用于限制本申请。对于本领域的技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的权利要求范围之内。The above are only optional embodiments of the application, and are not used to limit the application. For those skilled in the art, this application can have various modifications and changes. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of this application shall be included in the scope of the claims of this application.

Claims (9)

  1. 一种视频压缩方法,其特征在于,包括:A video compression method, characterized in that it comprises:
    将待压缩视频表示为一个包含N帧的帧序列,其中N为正整数;Express the video to be compressed as a frame sequence containing N frames, where N is a positive integer;
    分别对所述帧序列中的每一帧构建对应的压缩网络;Respectively construct a corresponding compression network for each frame in the frame sequence;
    根据第m-1帧重构帧训练所述第m帧对应的压缩网络,并根据所述训练后的第m帧对应的压缩网络得到第m帧重构帧,同时存储所述第m帧对应的压缩网络的权重参数,其中1≤m≤N,且m为正整数;Train the compression network corresponding to the m-th frame according to the m-1th frame reconstruction frame, and obtain the m-th frame reconstruction frame according to the compression network corresponding to the m-th frame after the training, and store the corresponding m-th frame The weight parameter of the compression network, where 1≤m≤N, and m is a positive integer;
    将所述存储的每一帧对应的压缩网络的权重参数按所述帧序列的顺序构成压缩视频。The weight parameters of the compression network corresponding to each of the stored frames are formed into a compressed video in the order of the frame sequence.
  2. 如权利要求1所述的视频压缩方法,其特征在于,所述根据第m-1帧重构帧训练所述第m帧对应的压缩网络包括:The video compression method according to claim 1, wherein the training the compression network corresponding to the m-th frame according to the reconstructed frame of the m-1th frame comprises:
    S1:将所述第m-1帧重构帧作为所述第m帧对应的压缩网络的输入,得到第m帧重建图;S1: Use the m-1th frame reconstructed frame as the input of the compression network corresponding to the mth frame to obtain the mth frame reconstructed image;
    S2:计算所述第m帧重建图与第m帧的损失函数,并根据所述损失函数进行梯度更新,调整所述第m帧对应的压缩网络的权重参数;S2: Calculate the loss function of the m-th frame reconstruction image and the m-th frame, perform gradient update according to the loss function, and adjust the weight parameters of the compression network corresponding to the m-th frame;
    S3:重复执行S1到S2直至所述第m帧对应的压缩网络满足预设条件为止。S3: Repeat S1 to S2 until the compression network corresponding to the m-th frame meets a preset condition.
  3. 如权利要求2所述的视频压缩方法,其特征在于,所述重复执行S1到S2直至所述第m帧对应的压缩网络满足预设条件为止包括:The video compression method according to claim 2, wherein the repeating S1 to S2 until the compression network corresponding to the m-th frame meets a preset condition comprises:
    重复执行S1到S2直至所述第m帧对应的压缩网络达到预设的重建质量为止Repeat S1 to S2 until the compression network corresponding to the mth frame reaches the preset reconstruction quality
    or
    重复执行S1到S2次数达到预设次数为止。Repeat S1 to S2 until the preset number of times is reached.
  4. 如权利要求1所述的视频压缩方法,其特征在于,所述根据所述训练后的第m帧对应的压缩网络得到第m帧重构帧包括:The video compression method according to claim 1, wherein the obtaining the m-th frame reconstructed frame according to the compression network corresponding to the m-th frame after the training comprises:
    将所述第m帧对应的压缩网络的权重参数作为压缩特征;Taking the weight parameter of the compression network corresponding to the mth frame as the compression feature;
    将所述压缩特征通过编码得到编码数据;Encoding the compression feature to obtain encoded data;
    将所述编码数据通过解码生成重构权重参数;Decoding the encoded data to generate reconstruction weight parameters;
    根据所述重构权重参数初始化所述第m帧对应的压缩网络;Initialize the compression network corresponding to the mth frame according to the reconstruction weight parameter;
    将所述第m-1帧输入所述重构权重参数更新后的第m帧对应的压缩网络中,得到第m帧重构帧。The m-1th frame is input into the compression network corresponding to the mth frame after the reconstruction weight parameter is updated to obtain the mth frame reconstructed frame.
  5. 如权利要求1-3任一项所述的视频压缩方法,其特征在于,所述根据第m-1帧重构帧训练所述第m帧对应的压缩网络,并根据所述训练后的第m帧对应的压缩网络得到第m帧重构帧包括:The video compression method according to any one of claims 1 to 3, wherein the compression network corresponding to the mth frame is trained according to the reconstructed frame of the m-1th frame, and the compression network corresponding to the mth frame is trained according to the trained The m-th frame reconstructed frame obtained by the compression network corresponding to the m frame includes:
    当m=1时,所述第m-1帧为预先输入的数据矩阵。When m=1, the m-1th frame is a pre-input data matrix.
  6. 一种视频压缩装置,其特征在于,包括:A video compression device, characterized by comprising:
    帧序列模块,用于将待压缩视频表示为一个包含N帧的帧序列,其中N为正整数;The frame sequence module is used to represent the video to be compressed as a frame sequence containing N frames, where N is a positive integer;
    网络构建模块,用于分别对所述帧序列中的每一帧构建对应的压缩网络;The network construction module is used to construct a corresponding compression network for each frame in the frame sequence;
    重构模块,用于根据第m-1帧重构帧训练所述第m帧对应的压缩网络,并根据所述训练后的第m帧对应的压缩网络得到第m帧重构帧,同时存储所述第m帧对应的压缩网络的权重参数,其中1≤m≤N,且m为正整数;The reconstruction module is used to train the compression network corresponding to the m-th frame according to the m-1th frame reconstruction frame, and obtain the m-th frame reconstruction frame according to the compression network corresponding to the m-th frame after the training, and store it at the same time The weight parameter of the compression network corresponding to the mth frame, where 1≤m≤N, and m is a positive integer;
    压缩视频构成模块,用于将所述存储的每一帧对应的压缩网络的权重参数按所述帧序列的顺序构成压缩视频。The compressed video forming module is used for forming the compressed video according to the order of the frame sequence by the weight parameters of the compression network corresponding to each of the stored frames.
  7. 如权利要求6所述的视频压缩装置,其特征在于,所述重构模块包括:7. The video compression device of claim 6, wherein the reconstruction module comprises:
    重建单元,用于将所述第m-1帧重构帧作为所述第m帧对应的压缩网络的输入,得到第m帧重建图;A reconstruction unit, configured to use the m-1th frame reconstructed frame as the input of the compression network corresponding to the mth frame to obtain the mth frame reconstructed image;
    参数调整单元,用于计算所述第m帧重建图与第m帧的损失函数,并根据所述损失函数进行梯度更新,调整所述第m帧对应的压缩网络的权重参数;A parameter adjustment unit, configured to calculate the loss function of the m-th frame reconstruction image and the m-th frame, perform gradient update according to the loss function, and adjust the weight parameters of the compression network corresponding to the m-th frame;
    重复执行单元,用于重复执行重建单元到参数调整单元直至所述第m帧对应的压缩网络满足预设条件为止。The repeated execution unit is configured to repeatedly execute the reconstruction unit to the parameter adjustment unit until the compression network corresponding to the m-th frame meets a preset condition.
  8. 如权利要求7所述的视频压缩装置,其特征在于,所述重构模块还包括:8. The video compression device of claim 7, wherein the reconstruction module further comprises:
    参数提取单元,用于将所述第m帧对应的压缩网络的权重参数作为压缩特征;A parameter extraction unit, configured to use the weight parameter of the compression network corresponding to the m-th frame as a compression feature;
    编码单元,用于将所述压缩特征通过编码得到编码数据;An encoding unit for encoding the compression feature to obtain encoded data;
    解码单元,用于将所述编码数据通过解码生成重构权重参数;A decoding unit, configured to decode the encoded data to generate reconstruction weight parameters;
    初始化单元,用于根据所述重构权重参数初始化所述第m帧对应的压缩网络;An initialization unit, configured to initialize the compression network corresponding to the mth frame according to the reconstruction weight parameter;
    重构帧生成单元,用于将所述第m-1帧输入所述重构权重参数更新后的第m帧对应的压缩网络中,得到第m帧重构帧。The reconstructed frame generating unit is configured to input the m-1th frame into the compression network corresponding to the mth frame after the reconstruction weight parameter is updated to obtain the mth frame reconstructed frame.
  9. 一种视频压缩终端设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现如权利要求1至5任一项所述方法的步骤。A video compression terminal device, comprising a memory, a processor, and a computer program stored in the memory and running on the processor, characterized in that the processor executes the computer program as claimed Steps of the method described in any one of 1 to 5.
PCT/CN2019/114953 2019-08-08 2019-11-01 Video compression method and apparatus, and terminal device WO2021022686A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910727889.6 2019-08-08
CN201910727889.6A CN110650339A (en) 2019-08-08 2019-08-08 Video compression method and device and terminal equipment

Publications (1)

Publication Number Publication Date
WO2021022686A1 true WO2021022686A1 (en) 2021-02-11

Family

ID=68990104

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/114953 WO2021022686A1 (en) 2019-08-08 2019-11-01 Video compression method and apparatus, and terminal device

Country Status (2)

Country Link
CN (1) CN110650339A (en)
WO (1) WO2021022686A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11770510B2 (en) 2018-09-21 2023-09-26 Andrew Sviridenko Video information compression using sketch-video
CN117750021A (en) * 2024-02-19 2024-03-22 北京铁力山科技股份有限公司 Video compression method, device, computer equipment and storage medium

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111314640B (en) * 2020-02-23 2022-06-07 苏州浪潮智能科技有限公司 Video compression method, device and medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108062780A (en) * 2017-12-29 2018-05-22 百度在线网络技术(北京)有限公司 Method for compressing image and device
CN108171762A (en) * 2017-12-27 2018-06-15 河海大学常州校区 System and method for is reconfigured quickly in a kind of similar image of the compressed sensing of deep learning
US20180176570A1 (en) * 2016-12-15 2018-06-21 WaveOne Inc. Deep learning based on image encoding and decoding
CN108282225A (en) * 2017-12-27 2018-07-13 吉林大学 Visible light communication method based on no lens imaging device
US20180350110A1 (en) * 2017-05-31 2018-12-06 Samsung Electronics Co., Ltd. Method and device for processing multi-channel feature map images
EP3451293A1 (en) * 2017-08-28 2019-03-06 Thomson Licensing Method and apparatus for filtering with multi-branch deep learning
US20190172230A1 (en) * 2017-12-06 2019-06-06 Siemens Healthcare Gmbh Magnetic resonance image reconstruction with deep reinforcement learning
CN109919864A (en) * 2019-02-20 2019-06-21 重庆邮电大学 A kind of compression of images cognitive method based on sparse denoising autoencoder network

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016132150A1 (en) * 2015-02-19 2016-08-25 Magic Pony Technology Limited Enhancing visual data using and augmenting model libraries
CN109862370A (en) * 2017-11-30 2019-06-07 北京大学 Video super-resolution processing method and processing device
CN108184128A (en) * 2018-01-11 2018-06-19 安徽优思天成智能科技有限公司 Video sequence lost frames prediction restoration methods based on deep neural network
CN108322685B (en) * 2018-01-12 2020-09-25 广州华多网络科技有限公司 Video frame insertion method, storage medium and terminal
CN109451308B (en) * 2018-11-29 2021-03-09 北京市商汤科技开发有限公司 Video compression processing method and device, electronic equipment and storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180176570A1 (en) * 2016-12-15 2018-06-21 WaveOne Inc. Deep learning based on image encoding and decoding
US20180350110A1 (en) * 2017-05-31 2018-12-06 Samsung Electronics Co., Ltd. Method and device for processing multi-channel feature map images
EP3451293A1 (en) * 2017-08-28 2019-03-06 Thomson Licensing Method and apparatus for filtering with multi-branch deep learning
US20190172230A1 (en) * 2017-12-06 2019-06-06 Siemens Healthcare Gmbh Magnetic resonance image reconstruction with deep reinforcement learning
CN108171762A (en) * 2017-12-27 2018-06-15 河海大学常州校区 System and method for is reconfigured quickly in a kind of similar image of the compressed sensing of deep learning
CN108282225A (en) * 2017-12-27 2018-07-13 吉林大学 Visible light communication method based on no lens imaging device
CN108062780A (en) * 2017-12-29 2018-05-22 百度在线网络技术(北京)有限公司 Method for compressing image and device
CN109919864A (en) * 2019-02-20 2019-06-21 重庆邮电大学 A kind of compression of images cognitive method based on sparse denoising autoencoder network

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11770510B2 (en) 2018-09-21 2023-09-26 Andrew Sviridenko Video information compression using sketch-video
CN117750021A (en) * 2024-02-19 2024-03-22 北京铁力山科技股份有限公司 Video compression method, device, computer equipment and storage medium
CN117750021B (en) * 2024-02-19 2024-04-30 北京铁力山科技股份有限公司 Video compression method, device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN110650339A (en) 2020-01-03

Similar Documents

Publication Publication Date Title
WO2021022686A1 (en) Video compression method and apparatus, and terminal device
CN109451308B (en) Video compression processing method and device, electronic equipment and storage medium
WO2021022685A1 (en) Neural network training method and apparatus, and terminal device
EP3637781A1 (en) Video processing method and apparatus
US10116965B2 (en) Three-dimensional video encoding method, three-dimensional video decoding method, and related apparatus
Liu et al. Compressive sampling-based image coding for resource-deficient visual communication
US8620075B2 (en) Image processing device and method
CN112150470B (en) Image segmentation method, device, medium and electronic equipment
Ding et al. A deep learning approach for quality enhancement of surveillance video
CN110913230A (en) Video frame prediction method and device and terminal equipment
CN111083478A (en) Video frame reconstruction method and device and terminal equipment
CN112637604B (en) Low-delay video compression method and device
DE102011100936A1 (en) Techniques for storing and retrieving pixel data
Chen et al. Enhanced separable convolution network for lightweight jpeg compression artifacts reduction
CN111083482A (en) Video compression network training method and device and terminal equipment
CN110944212A (en) Video frame reconstruction method and device and terminal equipment
CN111083479A (en) Video frame prediction method and device and terminal equipment
CN114501029B (en) Image encoding method, image decoding method, image encoding device, image decoding device, computer device, and storage medium
US20230186608A1 (en) Method, device, and computer program product for video processing
CN113613022B (en) Compression method, device and equipment of JPEG image and readable medium
WO2021057464A1 (en) Video processing method and apparatus, and storage medium and electronic device
CN110913220A (en) Video frame coding method and device and terminal equipment
CN111953971B (en) Video processing method, video processing device and terminal equipment
CN114882133B (en) Image coding and decoding method, system, device and medium
CN111565317A (en) Image compression method, coding and decoding network training method and device and electronic equipment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19940304

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19940304

Country of ref document: EP

Kind code of ref document: A1