CN115604490A - Semantic structured image coding and decoding method and system based on block mask - Google Patents

Semantic structured image coding and decoding method and system based on block mask Download PDF

Info

Publication number
CN115604490A
CN115604490A CN202211213966.4A CN202211213966A CN115604490A CN 115604490 A CN115604490 A CN 115604490A CN 202211213966 A CN202211213966 A CN 202211213966A CN 115604490 A CN115604490 A CN 115604490A
Authority
CN
China
Prior art keywords
image
code stream
group
quantized
block mask
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211213966.4A
Other languages
Chinese (zh)
Inventor
陈志波
冯若愚
金鑫
孙思萌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Science and Technology of China USTC
Original Assignee
University of Science and Technology of China USTC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Science and Technology of China USTC filed Critical University of Science and Technology of China USTC
Priority to CN202211213966.4A priority Critical patent/CN115604490A/en
Publication of CN115604490A publication Critical patent/CN115604490A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/90Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals
    • H04N19/91Entropy coding, e.g. variable length coding [VLC] or arithmetic coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/44Decoders specially adapted therefor, e.g. video decoders which are asymmetric with respect to the encoder
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding

Abstract

The invention discloses a semantic structural image coding and decoding method and a semantic structural image coding and decoding system based on a block mask.

Description

Semantic structured image coding and decoding method and system based on block mask
Technical Field
The invention relates to the technical field of image compression coding, in particular to a semantic structural image coding and decoding method and system based on a block mask.
Background
The existing image compression technology mainly aims at image compression oriented to human vision, and along with the rapid development of deep learning, a machine intelligent analysis character gradually plays an important role in various fields of human society production and life. The traditional human eye-oriented compression method needs to compress and transmit all information of an image, and an intelligent task analysis end needs to decode the whole image to obtain a complete image and then send the complete image to a subsequent intelligent task analysis model. In order to support man-machine hybrid intelligent application more efficiently, the existing method proposes the concept of image semantic structured code stream, for example, the first scheme: the invention patent of China with the publication number of CN110225341B, a task-driven code stream structured image coding method, wherein a regional decision network and an alignment module for target detection are introduced, a boundary frame of an object existing region is extracted based on compressed features, and spatial level segmentation is performed on the features according to the boundary frame. The segmented features are respectively entropy-coded in sequence to form a structured code stream.
However, in many practical application scenarios, such as automatic driving, smart cities, there are often objects to be processed that include overlapping or even dense objects in the image. When processing such images, the first scheme often adopts spatial segmentation on the image or compressed features based on the detection result directly, which may result in repeated coding of the overlapped region, and in case of too large overlapped area or dense objects, the coding efficiency will be seriously affected.
Scheme II: the Chinese patent application with publication number CN112929662A, namely an encoding method for solving the problem of object overlap in a code stream structured image encoding method, adopts the method that an external rectangle is taken from an object with overlap, and then the object with overlap is encoded and is taken as a whole code stream to be placed in a structured code stream. The problem of the solution is that the circumscribed rectangular frames of different objects may also contain a large amount of background information within the non-target rectangular frame, reducing the coding efficiency for specific intelligent tasks.
Disclosure of Invention
The invention aims to provide a semantic structural image coding and decoding method and a semantic structural image coding and decoding system based on a block mask.
The purpose of the invention is realized by the following technical scheme:
a semantic structural image coding and decoding method based on block mask comprises the following steps:
and an encoding part: performing target detection on an input image to obtain a target detection result and an example segmentation result, generating a block mask by combining predefined block size information, and distinguishing the group of a target object through the block mask; the method comprises the steps of obtaining image characteristics of an input image by using the transformation operation of a depth image encoder, carrying out the super-first-check transformation and quantization on the image characteristics to obtain quantized super-first-check characteristics, carrying out the super-first-check inverse transformation on the quantized super-prior characteristics to obtain integral probability distribution, respectively carrying out entropy coding on size information, the quantized super-first-check characteristics, a target detection result, block size information and a block mask of the input image, and then splicing to obtain code stream header information; quantizing the image features to obtain quantized image features, grouping the quantized image features on a spatial dimension by combining a block mask, wherein each group is a group feature called a group feature, entropy coding is respectively carried out on all the group features by combining the probability distribution of the whole body, all designated groups are selected according to task setting, and entropy coding streams corresponding to all the designated groups are combined to form texture parts of code streams; the code stream header information and the texture part of the code stream form a semantic structured code stream;
a decoding part: decoding code stream header information in the semantic structured code stream to obtain size information, quantized super-prior-check characteristics, a target detection result, block size information and a block mask of an input image; carrying out inverse-check transformation on the quantized superior-check characteristics to obtain integral probability distribution; taking out the code stream corresponding to each appointed group from the texture part, carrying out entropy decoding by combining the probability distribution of the whole body to obtain the group characteristics corresponding to each appointed group, and recombining the group characteristics corresponding to all the appointed groups into recombined quantized image characteristics by combining a block mask; and combining the size information of the input image and the characteristics of the recombined quantized image, and obtaining a reconstructed image through the inverse transformation operation of the depth image decoder.
A semantic structured image coding and decoding system based on block masks comprises:
an encoding unit for performing an encoding section, the encoding section including: performing target detection on an input image to obtain a target detection result and an example segmentation result, generating a block mask by combining predefined block size information, and distinguishing the group to which a target object belongs through the block mask; the method comprises the steps of obtaining image characteristics of an input image by using the transformation operation of a depth image encoder, carrying out the super-first-check transformation and quantization on the image characteristics to obtain quantized super-first-check characteristics, carrying out the super-first-check inverse transformation on the quantized super-prior characteristics to obtain integral probability distribution, respectively carrying out entropy coding on size information, the quantized super-first-check characteristics, a target detection result, block size information and a block mask of the input image, and then splicing to obtain code stream header information; quantizing the image features to obtain quantized image features, grouping the quantized image features on a spatial dimension by combining a block mask, wherein each grouping is a group feature called a group feature, entropy coding is respectively carried out on all the group features by combining the probability distribution of the whole body, all specified groups are selected according to task setting, and entropy coding streams corresponding to all the specified groups are combined to form texture parts of the code stream; the code stream header information and the texture part of the code stream form a semantic structured code stream;
a decoding unit for performing a decoding section, the decoding section comprising: decoding code stream header information in the semantic structured code stream to obtain size information, quantized super-prior-check characteristics, a target detection result, block size information and a block mask of an input image; carrying out inverse-check transformation on the quantized superior-check characteristics to obtain integral probability distribution; taking out the code stream corresponding to each appointed group from the texture part, carrying out entropy decoding by combining the probability distribution of the whole body to obtain the group characteristics corresponding to each appointed group, and recombining the group characteristics corresponding to all the appointed groups into recombined quantized image characteristics by combining a block mask; and combining the size information of the input image and the characteristics of the recombined quantized image, and obtaining a reconstructed image through the inverse transformation operation of the depth image decoder.
A processing device, comprising: one or more processors; a memory for storing one or more programs;
wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the aforementioned methods.
A readable storage medium, storing a computer program which, when executed by a processor, implements the aforementioned method.
According to the technical scheme provided by the invention, the image is divided by introducing the block mask, so that the image is more flexible and changeable, has stronger controllability and expansibility, and can keep the coding efficiency and improve the flexibility when the image of the object overlapping or even dense scene is coded by utilizing the semantic structural image coding technology compared with the original semantic structural coding method (namely the scheme I and the scheme II).
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the description below are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a frame diagram of a block mask-based semantic structured image coding and decoding method according to an embodiment of the present invention;
fig. 2 is a schematic diagram of intelligently analyzing an input image and generating a corresponding block mask according to an embodiment of the present invention;
FIG. 3 is a diagram illustrating an input image and target detection and example segmentation results provided by an embodiment of the present invention;
fig. 4 is a schematic diagram of splicing overlapped targets into a group according to a target detection result according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a group formed by stitching overlapping objects according to example segmentation results according to an embodiment of the present invention;
FIG. 6 is a diagram illustrating the segmentation of overlapping objects into different groups according to an example segmentation result according to an embodiment of the present invention;
fig. 7 is a schematic diagram of a block mask based semantic structured image coding and decoding system according to an embodiment of the present invention;
fig. 8 is a schematic diagram of a processing apparatus according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
The terms that may be used herein are first described as follows:
the term "and/or" means that either or both can be achieved, for example, X and/or Y means that both cases include "X" or "Y" as well as "X and Y".
The terms "comprising," "including," "containing," "having," or other similar terms in describing these terms are to be construed as non-exclusive inclusions. For example: including a feature (e.g., material, component, ingredient, carrier, formulation, material, dimension, part, component, mechanism, device, process, procedure, method, reaction condition, processing condition, parameter, algorithm, signal, data, product, or article of manufacture), is to be construed as including not only the particular feature explicitly listed but also other features not explicitly listed as such which are known in the art.
The term "consisting of … …" is meant to exclude any technical feature elements not specifically listed. If used in a claim, the term shall render the claim closed except for the inclusion of the technical features that are expressly listed except for the conventional impurities associated therewith. If the term occurs in only one clause of the claims, it is defined only to the elements explicitly recited in that clause, and elements recited in other clauses are not excluded from the overall claims.
The following describes the block mask-based semantic structured image encoding and decoding method and system in detail. Details which are not described in detail in the embodiments of the invention belong to the prior art which is known to the person skilled in the art. Those not specifically mentioned in the examples of the present invention were carried out according to the conventional conditions in the art or conditions suggested by the manufacturer. The reagents or instruments used in the examples of the present invention are not specified by manufacturers, and are all conventional products available by commercial purchase.
Example one
The embodiment of the invention provides a semantic structural image coding and decoding method based on a block mask, and an overall framework of the method is shown in figure 1 and mainly comprises a coding part and a decoding part.
1. And a coding part.
1. And carrying out target detection on the input image to obtain a detection result, generating a block mask by combining predefined block size information, and distinguishing the group of the target object through the block mask.
In the embodiment of the present invention, the size of an input image is recorded as H × W × C, where H and W represent the height and width of the input image, respectively, and C is the number of channels (all channels are coded and decoded simultaneously); the predefined block size information is denoted B and represents the side length of the image block, which has a size B × B.
In the embodiment of the invention, the input image obtains the detection result through image intelligent analysis algorithms such as target detection and the like. The detection result comprises: target detection results and instance segmentation results; wherein the target detection result comprises: the number of target objects, the location and the category of each target object; the position of each target object includes: the horizontal axis position of the upper left corner coordinate of the target object, the longitudinal axis position of the upper left corner coordinate, the height and the width; the example segmentation result is an edge contour for each target object.
In the embodiment of the invention, the detection result (the target detection result and the example segmentation result) is combined with the predefined block size information to generate the block mask m, and the size of the block mask m is
Figure BDA0003876058300000051
The value of each pixel is an integer of 0 to 255, and the value of each pixel in the block mask represents a group to which the corresponding image block belongs, as shown in fig. 2, an example of intelligently analyzing the input image and generating the corresponding block mask is shown, the left side of fig. 2 is a target detection result and an example segmentation result, and the right side is a generated block mask; wherein: each rectangular frame on the left side is a target detection result, and characters in the rectangular frames comprise categories and confidence degrees, wherein the categories are mainly considered in the embodiment of the invention; the edge contour of each target object on the left side is an example segmentation result.
In the embodiment of the present invention, when generating a block mask by combining a detection result (a target detection result and an example segmentation result) and predefined block size information: the overlapped targets can be spliced into a group according to the target detection result and used as the same target object; the overlapped targets can also be spliced into a group according to example segmentation results and used as the same target object; the overlapped targets can also be divided into different groups according to respective example segmentation results to be used as different target objects; of course, the present invention is not limited to the block mask generation method, and is used to select the specified generation method according to the actual situation, and combine the detection result with the predefined block size information to generate the block mask.
In the embodiment of the invention, the example segmentation result is mainly used when the block mask is generated and is not coded, the target detection result is required to be used when the block mask is generated, and meanwhile, a downstream task is also required to be used, so that coding is also required.
2. The image characteristics of the input image are obtained using a transform operation of a depth image encoder.
The inventionIn the examples, the image features are noted as y and the dimensions are
Figure BDA0003876058300000061
Wherein, C y For the number of channels, the transform operation of the depth image encoder may be implemented by referring to the conventional technique, and the details of the present invention are not repeated.
3. And performing super-prior transform and quantization on the image features to obtain quantized super-prior features, performing super-prior inverse transform on the quantized super-prior features to obtain overall probability distribution, entropy coding the size information of the input image, the quantized super-prior features, the target detection result, the block size information and the block mask respectively, and splicing to obtain code stream header information.
As shown in the right and upper left of fig. 1, the entropy coding is divided into two parts, the first part is to entropy code the size information of the input image, the quantized super-a-priori features, and the target detection results, and the entropy coding of the target detection results respectively means that the entropy coding is performed on the number of target objects, and the position and the category of each target. The second part is to perform entropy coding separately for block size information and block masks. And splicing to obtain code stream header information. The syntax structure definition of the bitstream header information is shown in table 1.
Table 1: code stream header information syntax structure
Figure BDA0003876058300000062
Wherein: image _ height _ minus1 represents the height H of the image; image _ width _ minus1 represents the width W of the image; side _ information _ length represents a quantized superior feature
Figure BDA0003876058300000063
The corresponding code stream length; group _ mask _ block _ size represents block size information; group _ mask _ length _ minus1 represents the code stream length corresponding to the block mask m; the bounding _ boxes _ numbers represents the number of target objects in the image; bounding _ box _ x, bounding _ box _ y, bounding _ box _ h, bounding _ boxW and bounding box category sequentially represent the abscissa of the upper left corner of the current target object, the abscissa of the upper left corner, the ordinate, the height, the width and the category information, and only an example of a single target object is provided here, wherein the above 5 pieces of information of a plurality of target objects are in a group and are sequentially arranged in sequence; u represents an unsigned data type, and for example, u (32) represents that the corresponding length of a code stream segment is 32 bits.
In the embodiment of the present invention, the super-first transform can be implemented by a conventional technique, which is not described herein.
In the embodiment of the invention, the code stream length is used for subsequent decoding, and the principle is as follows: in actual entropy encoding, the length of the encoded data cannot be known, and the decoding needs to read the code stream with the corresponding length first to decode the encoded data.
In addition, the quantized super-prior characteristics are needed to perform inverse-prior transformation to obtain overall probability distribution
Figure BDA0003876058300000071
And provided for use in group feature entropy coding.
4. Quantizing the image features to obtain quantized image features
Figure BDA0003876058300000072
And grouping the quantized image features on the space dimension by combining with a block mask, wherein the features of each group are called group features, and the probability distribution of the whole is combined
Figure BDA0003876058300000073
Entropy coding is carried out on all the group characteristics respectively, all the appointed groups are selected according to task setting, and entropy coding streams corresponding to all the appointed groups are combined to form texture parts of the code streams.
In the embodiment of the invention, the integral probability distribution
Figure BDA0003876058300000074
Means that the overall probability distribution of the image features is quantified and can be passed through
Figure BDA0003876058300000075
Obtaining the probability distribution corresponding to each group, specifically: for the kth group, first, the probability distribution of the whole is determined according to the grouping condition
Figure BDA0003876058300000076
And combining with autoregressive model to obtain corresponding probability distribution
Figure BDA0003876058300000077
The entropy model is then combined with corresponding group characteristics
Figure BDA0003876058300000078
And performing entropy coding. And executing the operation on all the groups to obtain all target entropy coding code streams.
In the embodiment of the invention, the grouping basis is as follows: portions of the same value on the block mask m and quantized image features
Figure BDA0003876058300000079
The corresponding parts in the spatial dimension are a group, called a group feature. The required groups can be determined according to the requirements of the downstream tasks, the number of the downstream tasks can be one or more, the required groups of different downstream tasks can be the same or different, and the required group number can be smaller than the number of all the groups or equal to the number of all the groups. All the appointed groups are selected according to the setting of the downstream task, entropy coding streams corresponding to all the appointed groups are combined to form a texture part of the code stream, and the texture part of the code stream is formed by combining the entropy coding streams arranged in the code stream from small to large according to the numerical value of an index k corresponding to the groups.
The semantic structure definition of the texture part is shown in table 2.
Table 2: syntactic structure of texture part
Figure BDA0003876058300000081
Wherein, the object _ texture _ length _ minus1 represents the code stream length corresponding to the corresponding texture part of the currently specified group. It should be noted that only a single relevant example of a given group is provided in the above syntax structure.
5. And the code stream header information and the texture part of the code stream form a semantic structured code stream.
In the embodiment of the present invention, the syntax structure of the semantic structured code stream includes: the height of the input image, the width of the input image, the code stream length corresponding to the quantization super-prior feature, the block size information, the code stream length corresponding to the block mask, the number of target objects, the position and the category of each target object, and the code stream length corresponding to the texture part corresponding to all the designated groups.
2. And decoding the part.
1. And decoding the header information. And decoding code stream header information in the semantic structured code stream to obtain size information, quantized super-prior-check characteristics, a target detection result, block size information and a block mask of the input image.
In the embodiment of the present invention, the size information of the input image mainly refers to the height H and the width W of the image, and the block size information is used to ensure the correct decoding of the block mask. The target detection result can be used for subsequent downstream tasks.
2. And (4) carrying out inverse transformation of the prior check. Performing inverse-inspection-prior transformation on the quantized characteristics to obtain overall probability distribution
Figure BDA0003876058300000082
For decoding of subsequent sets of features.
In the embodiment of the present invention, the inverse-superma transform can be implemented by a conventional technique, which is not described herein.
3. The code stream corresponding to each appointed group is taken out from the texture part and combined with the probability score of the wholeCloth
Figure BDA0003876058300000083
And carrying out entropy decoding to obtain the group characteristics corresponding to each appointed group. Specifically, the method comprises the following steps: block mask m, overall probability distribution obtained by combining decoding
Figure BDA0003876058300000084
And obtaining probability distribution corresponding to each appointed group with the autoregressive model, and carrying out entropy decoding to obtain group characteristics corresponding to each appointed group.
Taking into account the overall probability distribution obtained by the decoding part
Figure BDA0003876058300000085
The same as the encoding section, and the encoding and decoding sections utilize the overall probability distribution
Figure BDA0003876058300000086
The manner of obtaining the corresponding probability distribution of each group is the same, so the related flow in fig. 1 is shown in summary, that is, the part consisting of quantization → inverse prior test transform → probability estimation is omitted, which is a general representation manner of the related flow in the field.
4. And (5) characteristic recombination.
According to the position provided by the block mask m, the group characteristics corresponding to all the specified groups are recombined into recombined quantized image characteristics
Figure BDA0003876058300000091
Taking into account that a given group may be part of all groups, i.e. recombining the quantized image features
Figure BDA0003876058300000092
Group number and quantized image features in (1)
Figure BDA0003876058300000093
Different, and therefore the symbol is distinguished from the name, of course, if specifiedThe groups include all groups, and then the quantized image features are reconstructed
Figure BDA0003876058300000094
Equivalent to quantifying image features
Figure BDA0003876058300000095
5. Combining the size information of the input image and the characteristics of the recombined quantized image to obtain a reconstructed image through the inverse transformation operation of a depth image decoder
Figure BDA0003876058300000096
The inverse transform operation in this section can refer to conventional techniques, which are not described herein.
The scheme provided by the embodiment of the invention mainly has the following advantages:
(1) Based on a semantic structural image coding framework, a method for carrying out semantic structural on an image by using a block mask is provided, so that the high efficiency and flexibility of the semantic structural image coding are kept when objects in the image are overlapped or even dense.
2) Compared with the scheme I that the image is divided into the regions by the rectangular frame, the method provided by the invention adopts the block mask mode to divide the image, is more flexible and changeable, and has stronger controllability and expansibility.
Different block mask generation methods are described below to intuitively demonstrate the advantage of flexibility in the stream structuring when the target object has overlapping portions.
As shown in fig. 3, the target detection result and the example segmentation result are shown, wherein the left part of fig. 3 is the input image, and the right part is the target detection result (rectangular box) and the example segmentation result (edge extension).
FIGS. 4-6 provide three block mask partitioning schemes, where each rectangular block represents a B × B image block as described above; specifically, the method comprises the following steps:
as shown in fig. 4, the overlapped objects are spliced into a group according to the object detection result, wherein the dark color part is a foreground part containing people and umbrellas (both are object objects), and the light color part is a background part (the dark and light colors are used in the figure only for convenience of presentation, and the pixel value of the light color part is 0 and the pixel value of the dark color part is 1 in actual operation).
As shown in fig. 5, by splicing the overlapped targets into a group according to the example segmentation result, the method can further save the redundancy of the background part in the rectangular frame of the target detection on the basis of fig. 2.
As shown in fig. 6, the overlapped targets are divided into different groups (dark color part and oblique line filling part) according to the result of the respective example division and placed in the structured code stream, and the method further ensures the independence of each target in the structured code stream, and can better serve the application scene that the downstream task only needs a part of categories.
In fig. 4 to 6, the left side shows an image in which an input image and a block mask are superimposed, and the right side shows an image of a single block mask. It should be noted that the block mask generation method is not limited to the above three methods, and a user may also optimize the block mask generation process according to the needs of the user, which illustrates the flexibility and efficiency of the present solution.
Example two
The present invention also provides a block mask based semantic structured image coding and decoding system, which is implemented mainly based on the method provided by the foregoing embodiment, as shown in fig. 7, the system mainly includes:
an encoding unit for performing an encoding section, the encoding section including: performing target detection on an input image to obtain a target detection result and an example segmentation result, generating a block mask by combining predefined block size information, and distinguishing the group to which a target object belongs through the block mask; the method comprises the steps of obtaining image characteristics of an input image by using the transformation operation of a depth image encoder, carrying out super-priori transformation and quantization on the image characteristics to obtain quantization super-priori characteristics, carrying out super-priori inverse transformation on the quantization super-priori characteristics to obtain overall probability distribution, respectively carrying out entropy coding on size information, the quantization super-priori characteristics, a target detection result, block size information and a block mask of the input image, and splicing to obtain code stream header information; quantizing the image features to obtain quantized image features, grouping the quantized image features on a spatial dimension by combining a block mask, wherein each group is a group feature called a group feature, entropy coding is respectively carried out on all the group features by combining the probability distribution of the whole body, all designated groups are selected according to task setting, and entropy coding streams corresponding to all the designated groups are combined to form texture parts of code streams; the code stream header information and the texture part of the code stream form a semantic structured code stream;
a decoding unit for performing a decoding section, the decoding section comprising: decoding code stream header information in the semantic structured code stream to obtain size information, quantized super-prior-check characteristics, a target detection result, block size information and a block mask of an input image; carrying out the inverse-check transformation on the quantized super-check characteristics to obtain the integral probability distribution; taking out the code stream corresponding to each appointed group from the texture part, carrying out entropy decoding by combining the probability distribution of the whole body to obtain the group characteristics corresponding to each appointed group, and recombining the group characteristics corresponding to all the appointed groups into recombined quantized image characteristics by combining a block mask; and combining the size information of the input image and the characteristics of the recombined quantized image, and obtaining a reconstructed image through the inverse transformation operation of the depth image decoder.
It will be clear to those skilled in the art that, for convenience and simplicity of description, the foregoing division of the functional modules is merely used as an example, and in practical applications, the above function distribution may be performed by different functional modules according to needs, that is, the internal structure of the system is divided into different functional modules to perform all or part of the above described functions.
EXAMPLE III
The present invention also provides a processing apparatus, as shown in fig. 8, which mainly includes: one or more processors; a memory for storing one or more programs; wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the methods provided by the foregoing embodiments.
Further, the processing device further comprises at least one input device and at least one output device; in the processing device, a processor, a memory, an input device and an output device are connected through a bus.
In the embodiment of the present invention, the specific types of the memory, the input device, and the output device are not limited; for example:
the input device can be a touch screen, an image acquisition device, a physical button or a mouse and the like;
the output device may be a display terminal;
the Memory may be a Random Access Memory (RAM) or a non-volatile Memory (non-volatile Memory), such as a disk Memory.
Example four
The present invention also provides a readable storage medium storing a computer program which, when executed by a processor, implements the method provided by the foregoing embodiments.
The readable storage medium in the embodiment of the present invention may be provided in the foregoing processing device as a computer readable storage medium, for example, as a memory in the processing device. The readable storage medium may be various media that can store program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a magnetic disk, or an optical disk.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (8)

1. A semantic structural image coding and decoding method based on block mask is characterized by comprising the following steps:
and an encoding part: performing target detection on an input image to obtain a target detection result and an example segmentation result, generating a block mask by combining predefined block size information, and distinguishing the group of a target object through the block mask; the method comprises the steps of obtaining image characteristics of an input image by using the transformation operation of a depth image encoder, carrying out the super-first-check transformation and quantization on the image characteristics to obtain quantized super-first-check characteristics, carrying out the super-first-check inverse transformation on the quantized super-prior characteristics to obtain integral probability distribution, respectively carrying out entropy coding on size information, the quantized super-first-check characteristics, a target detection result, block size information and a block mask of the input image, and then splicing to obtain code stream header information; quantizing the image features to obtain quantized image features, grouping the quantized image features on a spatial dimension by combining a block mask, wherein each group is a group feature called a group feature, entropy coding is respectively carried out on all the group features by combining the probability distribution of the whole body, all designated groups are selected according to task setting, and entropy coding streams corresponding to all the designated groups are combined to form texture parts of code streams; the code stream header information and the texture part of the code stream form a semantic structured code stream;
and a decoding part: decoding code stream header information in the semantic structured code stream to obtain size information, quantized super-prior-check characteristics, a target detection result, block size information and a block mask of an input image; carrying out inverse-check transformation on the quantized superior-check characteristics to obtain integral probability distribution; taking out the code stream corresponding to each appointed group from the texture part, carrying out entropy decoding by combining the probability distribution of the whole body to obtain the group characteristics corresponding to each appointed group, and recombining the group characteristics corresponding to all the appointed groups into recombined quantized image characteristics by combining a block mask; and combining the size information of the input image and the characteristics of the recombined quantized image, and obtaining a reconstructed image through the inverse transformation operation of the depth image decoder.
2. The method of claim 1, wherein the performing target detection on the input image, obtaining a target detection result and an instance segmentation result, and generating a block mask according to predefined block size information comprises:
recording the size of an input image as H multiplied by W multiplied by C, wherein H and W respectively represent the height and the width of the input image, and C is the number of channels; the predefined block size information is marked as B, which represents the side length of the image block, and the size of the image block is B multiplied by B;
combining the target detection result, the example segmentation result and the predefined block size information to generate a block mask m, wherein the size of the block mask m is
Figure FDA0003876058290000011
The value of each pixel of the gray scale image is an integer of 0-255, and the value of each pixel in the block mask represents the group to which the corresponding image block belongs.
3. The method according to claim 1 or 2, wherein the target detection result comprises: the number of target objects, the location and the category of each target object; the position of each target object includes: the horizontal axis position of the upper left corner coordinate of the target object, and the vertical axis position, height and width of the upper left corner coordinate.
4. The method according to claim 3, wherein the variable-rate semantic structural image is encoded and decoded,
respectively entropy coding the target detection results, namely respectively entropy coding the number of the target objects, the position and the category of each target object;
after entropy coding is carried out on each group of characteristics, all required appointed groups are selected according to different task settings, and are arranged into the code stream from small to large according to the numerical value of the index corresponding to the appointed groups, and the texture parts of the code stream are formed by combination.
5. The method according to claim 2, wherein the syntax structure of the semantic structured code stream comprises: the height of the input image, the width of the input image, the code stream length corresponding to the quantization super-prior-check feature, the block size information, the code stream length corresponding to the block mask, the number of target objects, the position and the category of each target object, and the code stream length corresponding to the texture part corresponding to all the specified groups.
6. A semantic structured image coding and decoding system based on block mask, which is realized based on the method of any one of claims 1 to 5, and comprises:
an encoding unit for performing an encoding section, the encoding section including: performing target detection on an input image to obtain a target detection result and an example segmentation result, generating a block mask by combining predefined block size information, and distinguishing the group to which a target object belongs through the block mask; the method comprises the steps of obtaining image characteristics of an input image by using the transformation operation of a depth image encoder, carrying out the super-first-check transformation and quantization on the image characteristics to obtain quantized super-first-check characteristics, carrying out the super-first-check inverse transformation on the quantized super-prior characteristics to obtain integral probability distribution, respectively carrying out entropy coding on size information, the quantized super-first-check characteristics, a target detection result, block size information and a block mask of the input image, and then splicing to obtain code stream header information; quantizing the image features to obtain quantized image features, grouping the quantized image features on a spatial dimension by combining a block mask, wherein each group is a group feature called a group feature, entropy coding is respectively carried out on all the group features by combining the probability distribution of the whole body, all designated groups are selected according to task setting, and entropy coding streams corresponding to all the designated groups are combined to form texture parts of code streams; the code stream header information and the texture part of the code stream form a semantic structured code stream;
a decoding unit for performing a decoding section, the decoding section comprising: decoding code stream header information in the semantic structured code stream to obtain size information, quantized super-prior-check characteristics, a target detection result, block size information and a block mask of an input image; carrying out inverse-check transformation on the quantized superior-check characteristics to obtain integral probability distribution; taking out the code stream corresponding to each appointed group from the texture part, carrying out entropy decoding by combining the probability distribution of the whole body to obtain the group characteristics corresponding to each appointed group, and recombining the group characteristics corresponding to all the appointed groups into recombined quantized image characteristics by combining a block mask; and combining the size information of the input image and the characteristics of the recombined quantized image, and obtaining a reconstructed image through the inverse transformation operation of the depth image decoder.
7. A processing device, comprising: one or more processors; a memory for storing one or more programs;
wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1-5.
8. A readable storage medium, storing a computer program, characterized in that the computer program, when being executed by a processor, carries out the method according to any one of claims 1 to 5.
CN202211213966.4A 2022-09-30 2022-09-30 Semantic structured image coding and decoding method and system based on block mask Pending CN115604490A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211213966.4A CN115604490A (en) 2022-09-30 2022-09-30 Semantic structured image coding and decoding method and system based on block mask

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211213966.4A CN115604490A (en) 2022-09-30 2022-09-30 Semantic structured image coding and decoding method and system based on block mask

Publications (1)

Publication Number Publication Date
CN115604490A true CN115604490A (en) 2023-01-13

Family

ID=84845902

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211213966.4A Pending CN115604490A (en) 2022-09-30 2022-09-30 Semantic structured image coding and decoding method and system based on block mask

Country Status (1)

Country Link
CN (1) CN115604490A (en)

Similar Documents

Publication Publication Date Title
Yang et al. Towards coding for human and machine vision: Scalable face image coding
CN110689599B (en) 3D visual saliency prediction method based on non-local enhancement generation countermeasure network
CN111277829B (en) Encoding and decoding method and device
JP2004185628A (en) Coding and decoding methods for three-dimensional object data, and device for the methods
CN105513115B (en) Method and device for converting SWF into Canvas animation
CN115063799B (en) Print form mathematical formula identification method and device and storage medium
CN113822794A (en) Image style conversion method and device, computer equipment and storage medium
CN114724155A (en) Scene text detection method, system and equipment based on deep convolutional neural network
CN110662080B (en) Machine-oriented universal coding method
CN114972847A (en) Image processing method and device
CN116634242A (en) Speech-driven speaking video generation method, system, equipment and storage medium
CN116129013A (en) Method, device and storage medium for generating virtual person animation video
CN113177526A (en) Image processing method, device and equipment based on face recognition and storage medium
CN115063800B (en) Text recognition method and electronic equipment
CN115297327A (en) Semantic prior coding and decoding method and system based on semantic structural coding
CN115604490A (en) Semantic structured image coding and decoding method and system based on block mask
CN113657415B (en) Object detection method oriented to schematic diagram
CN107221019B (en) Chart conversion method and device
CN116095321A (en) Significant area image coding and decoding method, system, equipment and storage medium
CN113112464B (en) RGBD (red, green and blue) saliency object detection method and system based on cross-mode alternating current encoder
CN115454554A (en) Text description generation method, text description generation device, terminal and storage medium
CN114943204A (en) Chinese character font synthesis method based on generation countermeasure network
CN114399708A (en) Video motion migration deep learning system and method
CN114677569A (en) Character-image pair generation method and device based on feature decoupling
CN115914631A (en) Encoding and decoding method and system with controllable entropy decoding complexity

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination