CN113034626A - Optimization method for alignment of target object in feature domain in structured image coding - Google Patents

Optimization method for alignment of target object in feature domain in structured image coding Download PDF

Info

Publication number
CN113034626A
CN113034626A CN202110235413.8A CN202110235413A CN113034626A CN 113034626 A CN113034626 A CN 113034626A CN 202110235413 A CN202110235413 A CN 202110235413A CN 113034626 A CN113034626 A CN 113034626A
Authority
CN
China
Prior art keywords
target object
position information
compression
feature
structured
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110235413.8A
Other languages
Chinese (zh)
Other versions
CN113034626B (en
Inventor
陈志波
孙思萌
冯润森
金鑫
冯若愚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Science and Technology of China USTC
Original Assignee
University of Science and Technology of China USTC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Science and Technology of China USTC filed Critical University of Science and Technology of China USTC
Priority to CN202110235413.8A priority Critical patent/CN113034626B/en
Publication of CN113034626A publication Critical patent/CN113034626A/en
Application granted granted Critical
Publication of CN113034626B publication Critical patent/CN113034626B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • G06T9/002Image coding using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Abstract

The invention discloses an optimization method for alignment of a target object in a feature domain in structured image coding, which can realize alignment of the target object in the feature domain, solve the problem of position deviation of object information in an original image and in compression features in the existing structured image coding frame based on a neural network, ensure the integrity of the target object information in an object code stream of a structured code stream, improve the quality of partial decoding and simultaneously improve the accuracy of partial analysis tasks.

Description

Optimization method for alignment of target object in feature domain in structured image coding
Technical Field
The invention relates to the technical field of image coding, in particular to an optimization method for alignment of a target object in a feature domain in structured image coding.
Background
The existing video/image compression standard mainly aims at human eye-oriented compression, and as the algorithm of machine learning is gradually matured, the machine intelligent analysis task is also gradually applied to various fields of human social life and production, such as intelligent factories, intelligent cities, intelligent transportation and the like. In order to ensure interpretability and robustness of intelligent analysis results in a plurality of open scenes, brand new paradigms such as man-machine intelligent interaction cooperation, hybrid enhanced intelligence and the like are often required to be introduced.
In order to support the application scenario of man-machine hybrid intelligent application more efficiently, the existing method proposes the concept of semantic structured code stream, for example: the method comprises the steps of firstly, a task-driven code stream structured image coding method; and the second method supports a general video compression coding method with machine intelligence. Taking the first method as an example, a regional decision network and an alignment module for target detection are introduced, a bounding box of a region where an object may exist is extracted based on compressed features, and the features are segmented at a spatial level. The features after segmentation are respectively subjected to entropy coding and are sequentially put into the code stream to form a structured code stream.
However, when the code stream structured coding method is directly combined with various deep learning-based compression coding methods, the situation of incompatibility is often presented. In particular, existing neural network-based coding frameworks are typically composed of an encoder and a decoder. The input image obtains compression characteristics (namely hidden variable characteristics for compression, the size of the hidden variable characteristics is usually smaller than that of the original image) through an encoder, the compression characteristics are quantized and entropy-encoded to form a code stream for storage and transmission, and then the compression characteristics are decoded by the entropy of the code stream to reconstruct an original image through a decoder. When a semantic structured code stream coding frame is combined, firstly, object detection results, namely category labels and a boundary frame, are obtained based on an original image, after compression characteristics are quantized, region information related to each object is extracted according to the boundary frame, and then entropy coding is carried out in sequence to form a structured code stream. However, the bounding box obtained based on the image cannot simply obtain the bounding box of the object in the compression feature through downsampling, and this process cannot completely store all information related to the object in the compression feature, so that the quality of reconstruction performed by the terminal using the structured code stream or the analysis result of the intelligent analysis task may be affected.
Disclosure of Invention
The invention aims to provide an optimization method for aligning a target object in a feature domain in structured image coding, which can realize the alignment of the target object in the feature domain so as to improve the quality of partial decoding and improve the accuracy of partial analysis tasks.
The purpose of the invention is realized by the following technical scheme:
a method for optimizing the alignment of a target object in a feature domain in the coding of a structured image comprises the following steps:
setting an optimization module in a structured image coding frame to realize self-adaptive mapping between the position of a specified target object in an original image and the position of the specified target object in a compression characteristic; the input of the optimization module is pixel level position information of a specified target object in an original image; the optimization module maps the pixel level position information to obtain compressed characteristic level position information;
and then, based on the compression feature level position information, performing transformation processing on the quantized compression features output by the encoder in the structured image coding frame to obtain the compression features only containing the specified target object.
The technical scheme provided by the invention can solve the problem that the positions of object information in an original image and in compression characteristics in the existing structured image coding frame based on the neural network are aligned, and ensure the integrity of target object information in an object code stream of a structured code stream so as to improve the quality of partial decoding and improve the accuracy of partial analysis tasks.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.
Fig. 1 is a schematic diagram of an optimization method for aligning a target object in a feature domain in structured image coding according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment of the invention provides an optimization method for aligning a target object in a feature domain in structured image coding, which is used for solving the problem of position alignment of object information in an original image and a compressed feature in an existing structured image coding frame based on a neural network. The codec part can be any existing structured image coding framework (model) based on a neural network, and parameters of the structured image coding framework are fixed and only internal parameters of an optimization module are trained during training. In the actual use process, images are input randomly and position information of a target object is given, so that compression characteristic level position information can be obtained, and the compression characteristic level position information is used as a basis for generating a semantic structured code stream.
The method has the advantages that the method mainly comprises the following three aspects:
1) an optimization module is designed, and self-adaptive mapping between the relevant area of the object in the image and the relevant area of the object in the compression characteristic is realized, so that the problem of position offset of the object information on the original image and in the compression characteristic in the structured image coding frame based on the neural network is effectively solved.
2) The optimization module is used as an independent module for training, so that the performance of the original coding frame is not influenced, and the coding performance of the semantic structured image coding is ensured;
3) the optimization module is simple and easy to realize, and can be combined with various image coding frameworks based on the neural network to realize the semantic structural image framework, so that the flexibility of the semantic structural image coding framework is greatly improved, namely the encoder module and the decoder module are replaceable.
For ease of understanding, the methods provided by the present invention are described in detail below.
As shown in fig. 1, the input of the optimization module is pixel level position information of a specified target object in an original image, and the optimization module maps the pixel level position information to obtain compressed feature level position information; and then, based on the compression feature level position information, performing transformation processing on the quantized compression features output by the encoder in the structured image coding frame to obtain the compression features only containing the specified target object.
In the embodiment of the invention, the optimization module is realized by a plurality of layers (for example, 2-3 layers) of two-dimensional convolution layers, and the nonlinear of a mapping function is realized by using a Sigmoid function; specifically, the method comprises the following steps: the Sigmoid function has two uses: 1) the nonlinear mapping method is matched with the convolutional layer for use, and the nonlinearity of mapping is realized. The two-dimensional convolutional layers are related in that the two-dimensional convolutional layers realize linear mapping relation between layers, and the two-dimensional convolutional layers realize nonlinear mapping relation by combining a Sigmoid function. The parameters of the convolution layers are updated by an optimization network obtained by combining 2-3 convolution layers through a gradient return algorithm so as to obtain an optimal nonlinear mapping function through fitting, and the optimal mapping function from input pixel level position information to compression characteristic level position information is obtained. 2) At the end of the optimization module, all element values in the compressed feature-level location information that control the output are as binary as possible (i.e., not 0 or 1).
In this embodiment of the present invention, the pixel level position information may be obtained in a conventional manner, and specifically, the pixel level position information of the specified target object may be: bounding box (x) containing a specified target objecti,yi,wi,hi) Wherein (x)i,yi) Top left corner vertex position of bounding box, wi,hiRespectively, the width and height of the bounding box.
In the embodiment of the present invention, the pixel level position information may be converted into an area mask in a binary form, that is, pixels inside the bounding box are set to 1, and pixels outside the bounding box are set to 0. And then mapping the area mask in the binary form through an optimization module to obtain the compression feature level position information with the same dimensionality as the quantized compression feature.
In the embodiment of the present invention, the compression feature level position information may be in a binary form, and the binary form is used as an optimized area mask to perform element multiplication with the quantized compression feature, so as to obtain a compression feature only including a given object, and then, a decoder is used to perform decoding reconstruction, so as to obtain a reconstructed image only including the object.
In the training process, the error between the reconstructed image only containing the specified target object and the image only containing the specified target object in the original image (i.e. the reconstruction distortion D of the object-related information shown at the bottom of fig. 1) is used as a loss, and the parameters of the optimization module are updated through an inverse gradient propagation algorithm. Through a number of iterations until convergence (i.e., the loss is substantially unchanged and the model parameters are substantially not updated). The optimization module can achieve adaptive mapping between the relevant regions of the objects in the image and the relevant regions of the objects in the compressed features. The inverse gradient propagation algorithm used in the training process may directly use a conventional scheme.
When the code stream structured coding is realized, the pixel level position information and the compression characteristic level position information are transmitted as a part of structured code stream header information, the pixel level position information is used as a machine intelligent analysis task, and the compression characteristic level position information is used for recovering the complete compression characteristic when a complete image is reconstructed. The syntax structure of the header information is shown in table 1. Wherein bbox _ enabled _ flag is a pixel level position information (bounding box coordinate) switch flag, refined _ bbox _ enabled _ flag is a compression feature level position information switch flag, and bbox _ length _ minus1 and refined _ bbox _ length _ minus1 are respectively the code stream lengths of the two types of position information.
Figure BDA0002959800810000051
TABLE 1 header information grammar structure (syntax)
Through the above description of the embodiments, it is clear to those skilled in the art that the above embodiments can be implemented by software, and can also be implemented by software plus a necessary general hardware platform. With this understanding, the technical solutions of the embodiments can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.), and includes several instructions for enabling a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the methods according to the embodiments of the present invention.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (6)

1. A method for optimizing the alignment of a target object in a feature domain in structured image coding is characterized by comprising the following steps:
setting an optimization module in a structured image coding frame to realize self-adaptive mapping between the position of a specified target object in an original image and the position of the specified target object in a compression characteristic; the input of the optimization module is pixel level position information of a specified target object in an original image; the optimization module maps the pixel level position information to obtain compressed characteristic level position information;
and then, based on the compression feature level position information, performing transformation processing on the quantized compression features output by the encoder in the structured image coding frame to obtain the compression features only containing the specified target object.
2. The method of claim 1, wherein the step of mapping the pixel-level location information to obtain the compressed feature-level location information by the optimization module comprises:
the pixel level position information of the specified target object is: bounding box (x) containing a specified target objecti,yi,wi,hi) Wherein (x)i,yi) Top left corner vertex position of bounding box, wi,hiRespectively the width and the height of the bounding box; converting a bounding box containing a specified target object into an area mask in a binary form, namely setting pixels inside the bounding box to be 1 and setting pixels outside the bounding box to be 0;
the optimization module maps the area mask in the binary form to obtain the compression feature level position information with the same dimensionality as the quantized compression feature.
3. The method as claimed in claim 1, wherein the optimization module is implemented by multiple layers of two-dimensional convolution layers, and uses Sigmoid function to implement the non-linearity of the mapping function.
4. The method of claim 1, wherein the transforming the quantized compressed features output by an encoder in the framework of structured image coding based on the compressed feature level position information comprises:
element multiplying the compressed feature level position information with the quantized compressed feature, wherein the compressed feature level position information and the quantized compressed feature have the same dimensionality.
5. The method for optimizing the alignment of the target object in the feature domain in the structured image coding according to any one of claims 1 to 4, wherein after the compression features only containing the designated target object are obtained, the decoding reconstruction is performed by a decoder in a structured image coding frame to obtain the reconstructed image only containing the designated target object;
in the training process, the error between the reconstructed image only containing the specified target object and the image only containing the specified target object in the original image is used as loss, and the parameters of the optimization module are updated through a reverse gradient propagation algorithm.
6. The method according to any of claims 1 to 4, wherein the pixel level location information and the compressed feature level location information are transmitted as part of the structured codestream header information, the pixel level location information is used as a machine intelligent analysis task, and the compressed feature level location information is used to recover the complete compressed features when reconstructing the complete image.
CN202110235413.8A 2021-03-03 2021-03-03 Optimization method for alignment of target object in feature domain in structured image coding Active CN113034626B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110235413.8A CN113034626B (en) 2021-03-03 2021-03-03 Optimization method for alignment of target object in feature domain in structured image coding

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110235413.8A CN113034626B (en) 2021-03-03 2021-03-03 Optimization method for alignment of target object in feature domain in structured image coding

Publications (2)

Publication Number Publication Date
CN113034626A true CN113034626A (en) 2021-06-25
CN113034626B CN113034626B (en) 2024-04-02

Family

ID=76466484

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110235413.8A Active CN113034626B (en) 2021-03-03 2021-03-03 Optimization method for alignment of target object in feature domain in structured image coding

Country Status (1)

Country Link
CN (1) CN113034626B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090067491A1 (en) * 2007-09-07 2009-03-12 Microsoft Corporation Learning-Based Image Compression
CN110457503A (en) * 2019-07-31 2019-11-15 北京大学 A kind of rapid Optimum depth hashing image coding method and target image search method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090067491A1 (en) * 2007-09-07 2009-03-12 Microsoft Corporation Learning-Based Image Compression
CN110457503A (en) * 2019-07-31 2019-11-15 北京大学 A kind of rapid Optimum depth hashing image coding method and target image search method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
李志军;杨楚皙;刘丹;孙大洋;: "基于深度卷积神经网络的信息流增强图像压缩方法", 吉林大学学报(工学版), no. 05 *
马惠珠;宋朝晖;季飞;侯嘉;熊小芸;: "项目计算机辅助受理的研究方向与关键词――2012年度受理情况与2013年度注意事项", 电子与信息学报, no. 01 *

Also Published As

Publication number Publication date
CN113034626B (en) 2024-04-02

Similar Documents

Publication Publication Date Title
CN111832570A (en) Image semantic segmentation model training method and system
CN111800641A (en) Image coding and decoding method and device adopting different types of reconstructed pixels in same mode
CN113870335A (en) Monocular depth estimation method based on multi-scale feature fusion
CN110418139B (en) Video super-resolution restoration method, device, equipment and storage medium
CN110830808A (en) Video frame reconstruction method and device and terminal equipment
CN112235569B (en) Quick video classification method, system and device based on H264 compressed domain
Löhdefink et al. On low-bitrate image compression for distributed automotive perception: Higher peak snr does not mean better semantic segmentation
Löhdefink et al. GAN-vs. JPEG2000 image compression for distributed automotive perception: Higher peak SNR does not mean better semantic segmentation
Kuanar et al. Gated fusion network for sao filter and inter frame prediction in versatile video coding
Zheng et al. A novel gray image representation using overlapping rectangular NAM and extended shading approach
CN113034626A (en) Optimization method for alignment of target object in feature domain in structured image coding
CN116416216A (en) Quality evaluation method based on self-supervision feature extraction, storage medium and terminal
CN113781376B (en) High-definition face attribute editing method based on divide-and-congress
WO2022226850A1 (en) Point cloud quality enhancement method, encoding and decoding methods, apparatuses, and storage medium
CN112954350B (en) Video post-processing optimization method and device based on frame classification
KR20230115043A (en) Video processing method and video processing apparatus using super resolution deep learning network based on image quality
Zhang et al. Global Priors with Anchored-stripe Attention and MultiScale Convolution for Remote Sensing Images Compression
CN111539874A (en) Method and device for accelerating video super-resolution reconstruction
CN116600107B (en) HEVC-SCC quick coding method and device based on IPMS-CNN and spatial neighboring CU coding modes
US20240126809A1 (en) Systems and methods for organizing and searching a video database
CN116137050B (en) Three-dimensional real person model processing method, processing device, electronic equipment and storage medium
CN117692652B (en) Visible light and infrared video fusion coding method based on deep learning
CN115631115B (en) Dynamic image restoration method based on recursion transform
KR102243503B1 (en) The method for fast image recognition in low-cost devices via high efficient machine learning, and the device for carrying out the method
CN110677681A (en) Video coding and decoding method and device and terminal equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant