CN115604490A

CN115604490A - Semantic structured image coding and decoding method and system based on block mask

Info

Publication number: CN115604490A
Application number: CN202211213966.4A
Authority: CN
Inventors: 陈志波; 冯若愚; 金鑫; 孙思萌
Original assignee: University of Science and Technology of China USTC
Current assignee: University of Science and Technology of China USTC
Priority date: 2022-09-30
Filing date: 2022-09-30
Publication date: 2023-01-13

Abstract

The invention discloses a semantic structural image coding and decoding method and a semantic structural image coding and decoding system based on a block mask.

Description

Semantic structured image coding and decoding method and system based on block mask

Technical Field

The invention relates to the technical field of image compression coding, in particular to a semantic structural image coding and decoding method and system based on a block mask.

Background

The existing image compression technology mainly aims at image compression oriented to human vision, and along with the rapid development of deep learning, a machine intelligent analysis character gradually plays an important role in various fields of human society production and life. The traditional human eye-oriented compression method needs to compress and transmit all information of an image, and an intelligent task analysis end needs to decode the whole image to obtain a complete image and then send the complete image to a subsequent intelligent task analysis model. In order to support man-machine hybrid intelligent application more efficiently, the existing method proposes the concept of image semantic structured code stream, for example, the first scheme: the invention patent of China with the publication number of CN110225341B, a task-driven code stream structured image coding method, wherein a regional decision network and an alignment module for target detection are introduced, a boundary frame of an object existing region is extracted based on compressed features, and spatial level segmentation is performed on the features according to the boundary frame. The segmented features are respectively entropy-coded in sequence to form a structured code stream.

However, in many practical application scenarios, such as automatic driving, smart cities, there are often objects to be processed that include overlapping or even dense objects in the image. When processing such images, the first scheme often adopts spatial segmentation on the image or compressed features based on the detection result directly, which may result in repeated coding of the overlapped region, and in case of too large overlapped area or dense objects, the coding efficiency will be seriously affected.

Scheme II: the Chinese patent application with publication number CN112929662A, namely an encoding method for solving the problem of object overlap in a code stream structured image encoding method, adopts the method that an external rectangle is taken from an object with overlap, and then the object with overlap is encoded and is taken as a whole code stream to be placed in a structured code stream. The problem of the solution is that the circumscribed rectangular frames of different objects may also contain a large amount of background information within the non-target rectangular frame, reducing the coding efficiency for specific intelligent tasks.

Disclosure of Invention

The invention aims to provide a semantic structural image coding and decoding method and a semantic structural image coding and decoding system based on a block mask.

The purpose of the invention is realized by the following technical scheme:

a semantic structural image coding and decoding method based on block mask comprises the following steps:

and an encoding part: performing target detection on an input image to obtain a target detection result and an example segmentation result, generating a block mask by combining predefined block size information, and distinguishing the group of a target object through the block mask; the method comprises the steps of obtaining image characteristics of an input image by using the transformation operation of a depth image encoder, carrying out the super-first-check transformation and quantization on the image characteristics to obtain quantized super-first-check characteristics, carrying out the super-first-check inverse transformation on the quantized super-prior characteristics to obtain integral probability distribution, respectively carrying out entropy coding on size information, the quantized super-first-check characteristics, a target detection result, block size information and a block mask of the input image, and then splicing to obtain code stream header information; quantizing the image features to obtain quantized image features, grouping the quantized image features on a spatial dimension by combining a block mask, wherein each group is a group feature called a group feature, entropy coding is respectively carried out on all the group features by combining the probability distribution of the whole body, all designated groups are selected according to task setting, and entropy coding streams corresponding to all the designated groups are combined to form texture parts of code streams; the code stream header information and the texture part of the code stream form a semantic structured code stream;

a decoding part: decoding code stream header information in the semantic structured code stream to obtain size information, quantized super-prior-check characteristics, a target detection result, block size information and a block mask of an input image; carrying out inverse-check transformation on the quantized superior-check characteristics to obtain integral probability distribution; taking out the code stream corresponding to each appointed group from the texture part, carrying out entropy decoding by combining the probability distribution of the whole body to obtain the group characteristics corresponding to each appointed group, and recombining the group characteristics corresponding to all the appointed groups into recombined quantized image characteristics by combining a block mask; and combining the size information of the input image and the characteristics of the recombined quantized image, and obtaining a reconstructed image through the inverse transformation operation of the depth image decoder.

A semantic structured image coding and decoding system based on block masks comprises:

an encoding unit for performing an encoding section, the encoding section including: performing target detection on an input image to obtain a target detection result and an example segmentation result, generating a block mask by combining predefined block size information, and distinguishing the group to which a target object belongs through the block mask; the method comprises the steps of obtaining image characteristics of an input image by using the transformation operation of a depth image encoder, carrying out the super-first-check transformation and quantization on the image characteristics to obtain quantized super-first-check characteristics, carrying out the super-first-check inverse transformation on the quantized super-prior characteristics to obtain integral probability distribution, respectively carrying out entropy coding on size information, the quantized super-first-check characteristics, a target detection result, block size information and a block mask of the input image, and then splicing to obtain code stream header information; quantizing the image features to obtain quantized image features, grouping the quantized image features on a spatial dimension by combining a block mask, wherein each grouping is a group feature called a group feature, entropy coding is respectively carried out on all the group features by combining the probability distribution of the whole body, all specified groups are selected according to task setting, and entropy coding streams corresponding to all the specified groups are combined to form texture parts of the code stream; the code stream header information and the texture part of the code stream form a semantic structured code stream;

a decoding unit for performing a decoding section, the decoding section comprising: decoding code stream header information in the semantic structured code stream to obtain size information, quantized super-prior-check characteristics, a target detection result, block size information and a block mask of an input image; carrying out inverse-check transformation on the quantized superior-check characteristics to obtain integral probability distribution; taking out the code stream corresponding to each appointed group from the texture part, carrying out entropy decoding by combining the probability distribution of the whole body to obtain the group characteristics corresponding to each appointed group, and recombining the group characteristics corresponding to all the appointed groups into recombined quantized image characteristics by combining a block mask; and combining the size information of the input image and the characteristics of the recombined quantized image, and obtaining a reconstructed image through the inverse transformation operation of the depth image decoder.

A processing device, comprising: one or more processors; a memory for storing one or more programs;

wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the aforementioned methods.

A readable storage medium, storing a computer program which, when executed by a processor, implements the aforementioned method.

According to the technical scheme provided by the invention, the image is divided by introducing the block mask, so that the image is more flexible and changeable, has stronger controllability and expansibility, and can keep the coding efficiency and improve the flexibility when the image of the object overlapping or even dense scene is coded by utilizing the semantic structural image coding technology compared with the original semantic structural coding method (namely the scheme I and the scheme II).

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the description below are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a frame diagram of a block mask-based semantic structured image coding and decoding method according to an embodiment of the present invention;

fig. 2 is a schematic diagram of intelligently analyzing an input image and generating a corresponding block mask according to an embodiment of the present invention;

FIG. 3 is a diagram illustrating an input image and target detection and example segmentation results provided by an embodiment of the present invention;

fig. 4 is a schematic diagram of splicing overlapped targets into a group according to a target detection result according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a group formed by stitching overlapping objects according to example segmentation results according to an embodiment of the present invention;

FIG. 6 is a diagram illustrating the segmentation of overlapping objects into different groups according to an example segmentation result according to an embodiment of the present invention;

fig. 7 is a schematic diagram of a block mask based semantic structured image coding and decoding system according to an embodiment of the present invention;

fig. 8 is a schematic diagram of a processing apparatus according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.

The terms that may be used herein are first described as follows:

the term "and/or" means that either or both can be achieved, for example, X and/or Y means that both cases include "X" or "Y" as well as "X and Y".

The terms "comprising," "including," "containing," "having," or other similar terms in describing these terms are to be construed as non-exclusive inclusions. For example: including a feature (e.g., material, component, ingredient, carrier, formulation, material, dimension, part, component, mechanism, device, process, procedure, method, reaction condition, processing condition, parameter, algorithm, signal, data, product, or article of manufacture), is to be construed as including not only the particular feature explicitly listed but also other features not explicitly listed as such which are known in the art.

The term "consisting of … …" is meant to exclude any technical feature elements not specifically listed. If used in a claim, the term shall render the claim closed except for the inclusion of the technical features that are expressly listed except for the conventional impurities associated therewith. If the term occurs in only one clause of the claims, it is defined only to the elements explicitly recited in that clause, and elements recited in other clauses are not excluded from the overall claims.

The following describes the block mask-based semantic structured image encoding and decoding method and system in detail. Details which are not described in detail in the embodiments of the invention belong to the prior art which is known to the person skilled in the art. Those not specifically mentioned in the examples of the present invention were carried out according to the conventional conditions in the art or conditions suggested by the manufacturer. The reagents or instruments used in the examples of the present invention are not specified by manufacturers, and are all conventional products available by commercial purchase.

Example one

The embodiment of the invention provides a semantic structural image coding and decoding method based on a block mask, and an overall framework of the method is shown in figure 1 and mainly comprises a coding part and a decoding part.

1. And a coding part.

1. And carrying out target detection on the input image to obtain a detection result, generating a block mask by combining predefined block size information, and distinguishing the group of the target object through the block mask.

In the embodiment of the present invention, the size of an input image is recorded as H × W × C, where H and W represent the height and width of the input image, respectively, and C is the number of channels (all channels are coded and decoded simultaneously); the predefined block size information is denoted B and represents the side length of the image block, which has a size B × B.

In the embodiment of the invention, the input image obtains the detection result through image intelligent analysis algorithms such as target detection and the like. The detection result comprises: target detection results and instance segmentation results; wherein the target detection result comprises: the number of target objects, the location and the category of each target object; the position of each target object includes: the horizontal axis position of the upper left corner coordinate of the target object, the longitudinal axis position of the upper left corner coordinate, the height and the width; the example segmentation result is an edge contour for each target object.

In the embodiment of the invention, the detection result (the target detection result and the example segmentation result) is combined with the predefined block size information to generate the block mask m, and the size of the block mask m is

The value of each pixel is an integer of 0 to 255, and the value of each pixel in the block mask represents a group to which the corresponding image block belongs, as shown in fig. 2, an example of intelligently analyzing the input image and generating the corresponding block mask is shown, the left side of fig. 2 is a target detection result and an example segmentation result, and the right side is a generated block mask; wherein: each rectangular frame on the left side is a target detection result, and characters in the rectangular frames comprise categories and confidence degrees, wherein the categories are mainly considered in the embodiment of the invention; the edge contour of each target object on the left side is an example segmentation result.

In the embodiment of the present invention, when generating a block mask by combining a detection result (a target detection result and an example segmentation result) and predefined block size information: the overlapped targets can be spliced into a group according to the target detection result and used as the same target object; the overlapped targets can also be spliced into a group according to example segmentation results and used as the same target object; the overlapped targets can also be divided into different groups according to respective example segmentation results to be used as different target objects; of course, the present invention is not limited to the block mask generation method, and is used to select the specified generation method according to the actual situation, and combine the detection result with the predefined block size information to generate the block mask.

In the embodiment of the invention, the example segmentation result is mainly used when the block mask is generated and is not coded, the target detection result is required to be used when the block mask is generated, and meanwhile, a downstream task is also required to be used, so that coding is also required.

2. The image characteristics of the input image are obtained using a transform operation of a depth image encoder.

The inventionIn the examples, the image features are noted as y and the dimensions are

Wherein, C _y For the number of channels, the transform operation of the depth image encoder may be implemented by referring to the conventional technique, and the details of the present invention are not repeated.

3. And performing super-prior transform and quantization on the image features to obtain quantized super-prior features, performing super-prior inverse transform on the quantized super-prior features to obtain overall probability distribution, entropy coding the size information of the input image, the quantized super-prior features, the target detection result, the block size information and the block mask respectively, and splicing to obtain code stream header information.

As shown in the right and upper left of fig. 1, the entropy coding is divided into two parts, the first part is to entropy code the size information of the input image, the quantized super-a-priori features, and the target detection results, and the entropy coding of the target detection results respectively means that the entropy coding is performed on the number of target objects, and the position and the category of each target. The second part is to perform entropy coding separately for block size information and block masks. And splicing to obtain code stream header information. The syntax structure definition of the bitstream header information is shown in table 1.

Table 1: code stream header information syntax structure

Wherein: image _ height _ minus1 represents the height H of the image; image _ width _ minus1 represents the width W of the image; side _ information _ length represents a quantized superior feature

The corresponding code stream length; group _ mask _ block _ size represents block size information; group _ mask _ length _ minus1 represents the code stream length corresponding to the block mask m; the bounding _ boxes _ numbers represents the number of target objects in the image; bounding _ box _ x, bounding _ box _ y, bounding _ box _ h, bounding _ boxW and bounding box category sequentially represent the abscissa of the upper left corner of the current target object, the abscissa of the upper left corner, the ordinate, the height, the width and the category information, and only an example of a single target object is provided here, wherein the above 5 pieces of information of a plurality of target objects are in a group and are sequentially arranged in sequence; u represents an unsigned data type, and for example, u (32) represents that the corresponding length of a code stream segment is 32 bits.

In the embodiment of the present invention, the super-first transform can be implemented by a conventional technique, which is not described herein.

In the embodiment of the invention, the code stream length is used for subsequent decoding, and the principle is as follows: in actual entropy encoding, the length of the encoded data cannot be known, and the decoding needs to read the code stream with the corresponding length first to decode the encoded data.

In addition, the quantized super-prior characteristics are needed to perform inverse-prior transformation to obtain overall probability distribution

And provided for use in group feature entropy coding.

4. Quantizing the image features to obtain quantized image features

And grouping the quantized image features on the space dimension by combining with a block mask, wherein the features of each group are called group features, and the probability distribution of the whole is combined

Entropy coding is carried out on all the group characteristics respectively, all the appointed groups are selected according to task setting, and entropy coding streams corresponding to all the appointed groups are combined to form texture parts of the code streams.

In the embodiment of the invention, the integral probability distribution

Means that the overall probability distribution of the image features is quantified and can be passed through

Obtaining the probability distribution corresponding to each group, specifically: for the kth group, first, the probability distribution of the whole is determined according to the grouping condition

And combining with autoregressive model to obtain corresponding probability distribution

The entropy model is then combined with corresponding group characteristics

And performing entropy coding. And executing the operation on all the groups to obtain all target entropy coding code streams.

In the embodiment of the invention, the grouping basis is as follows: portions of the same value on the block mask m and quantized image features

The corresponding parts in the spatial dimension are a group, called a group feature. The required groups can be determined according to the requirements of the downstream tasks, the number of the downstream tasks can be one or more, the required groups of different downstream tasks can be the same or different, and the required group number can be smaller than the number of all the groups or equal to the number of all the groups. All the appointed groups are selected according to the setting of the downstream task, entropy coding streams corresponding to all the appointed groups are combined to form a texture part of the code stream, and the texture part of the code stream is formed by combining the entropy coding streams arranged in the code stream from small to large according to the numerical value of an index k corresponding to the groups.

The semantic structure definition of the texture part is shown in table 2.

Table 2: syntactic structure of texture part

Wherein, the object _ texture _ length _ minus1 represents the code stream length corresponding to the corresponding texture part of the currently specified group. It should be noted that only a single relevant example of a given group is provided in the above syntax structure.

5. And the code stream header information and the texture part of the code stream form a semantic structured code stream.

In the embodiment of the present invention, the syntax structure of the semantic structured code stream includes: the height of the input image, the width of the input image, the code stream length corresponding to the quantization super-prior feature, the block size information, the code stream length corresponding to the block mask, the number of target objects, the position and the category of each target object, and the code stream length corresponding to the texture part corresponding to all the designated groups.

2. And decoding the part.

1. And decoding the header information. And decoding code stream header information in the semantic structured code stream to obtain size information, quantized super-prior-check characteristics, a target detection result, block size information and a block mask of the input image.

In the embodiment of the present invention, the size information of the input image mainly refers to the height H and the width W of the image, and the block size information is used to ensure the correct decoding of the block mask. The target detection result can be used for subsequent downstream tasks.

2. And (4) carrying out inverse transformation of the prior check. Performing inverse-inspection-prior transformation on the quantized characteristics to obtain overall probability distribution

For decoding of subsequent sets of features.

In the embodiment of the present invention, the inverse-superma transform can be implemented by a conventional technique, which is not described herein.

3. The code stream corresponding to each appointed group is taken out from the texture part and combined with the probability score of the wholeCloth

And carrying out entropy decoding to obtain the group characteristics corresponding to each appointed group. Specifically, the method comprises the following steps: block mask m, overall probability distribution obtained by combining decoding

And obtaining probability distribution corresponding to each appointed group with the autoregressive model, and carrying out entropy decoding to obtain group characteristics corresponding to each appointed group.

Taking into account the overall probability distribution obtained by the decoding part

The same as the encoding section, and the encoding and decoding sections utilize the overall probability distribution

The manner of obtaining the corresponding probability distribution of each group is the same, so the related flow in fig. 1 is shown in summary, that is, the part consisting of quantization → inverse prior test transform → probability estimation is omitted, which is a general representation manner of the related flow in the field.

4. And (5) characteristic recombination.

According to the position provided by the block mask m, the group characteristics corresponding to all the specified groups are recombined into recombined quantized image characteristics

Taking into account that a given group may be part of all groups, i.e. recombining the quantized image features

Group number and quantized image features in (1)

Different, and therefore the symbol is distinguished from the name, of course, if specifiedThe groups include all groups, and then the quantized image features are reconstructed

Equivalent to quantifying image features

5. Combining the size information of the input image and the characteristics of the recombined quantized image to obtain a reconstructed image through the inverse transformation operation of a depth image decoder

The inverse transform operation in this section can refer to conventional techniques, which are not described herein.

The scheme provided by the embodiment of the invention mainly has the following advantages:

(1) Based on a semantic structural image coding framework, a method for carrying out semantic structural on an image by using a block mask is provided, so that the high efficiency and flexibility of the semantic structural image coding are kept when objects in the image are overlapped or even dense.

2) Compared with the scheme I that the image is divided into the regions by the rectangular frame, the method provided by the invention adopts the block mask mode to divide the image, is more flexible and changeable, and has stronger controllability and expansibility.

Different block mask generation methods are described below to intuitively demonstrate the advantage of flexibility in the stream structuring when the target object has overlapping portions.

As shown in fig. 3, the target detection result and the example segmentation result are shown, wherein the left part of fig. 3 is the input image, and the right part is the target detection result (rectangular box) and the example segmentation result (edge extension).

FIGS. 4-6 provide three block mask partitioning schemes, where each rectangular block represents a B × B image block as described above; specifically, the method comprises the following steps:

as shown in fig. 4, the overlapped objects are spliced into a group according to the object detection result, wherein the dark color part is a foreground part containing people and umbrellas (both are object objects), and the light color part is a background part (the dark and light colors are used in the figure only for convenience of presentation, and the pixel value of the light color part is 0 and the pixel value of the dark color part is 1 in actual operation).

As shown in fig. 5, by splicing the overlapped targets into a group according to the example segmentation result, the method can further save the redundancy of the background part in the rectangular frame of the target detection on the basis of fig. 2.

As shown in fig. 6, the overlapped targets are divided into different groups (dark color part and oblique line filling part) according to the result of the respective example division and placed in the structured code stream, and the method further ensures the independence of each target in the structured code stream, and can better serve the application scene that the downstream task only needs a part of categories.

In fig. 4 to 6, the left side shows an image in which an input image and a block mask are superimposed, and the right side shows an image of a single block mask. It should be noted that the block mask generation method is not limited to the above three methods, and a user may also optimize the block mask generation process according to the needs of the user, which illustrates the flexibility and efficiency of the present solution.

Example two

The present invention also provides a block mask based semantic structured image coding and decoding system, which is implemented mainly based on the method provided by the foregoing embodiment, as shown in fig. 7, the system mainly includes:

an encoding unit for performing an encoding section, the encoding section including: performing target detection on an input image to obtain a target detection result and an example segmentation result, generating a block mask by combining predefined block size information, and distinguishing the group to which a target object belongs through the block mask; the method comprises the steps of obtaining image characteristics of an input image by using the transformation operation of a depth image encoder, carrying out super-priori transformation and quantization on the image characteristics to obtain quantization super-priori characteristics, carrying out super-priori inverse transformation on the quantization super-priori characteristics to obtain overall probability distribution, respectively carrying out entropy coding on size information, the quantization super-priori characteristics, a target detection result, block size information and a block mask of the input image, and splicing to obtain code stream header information; quantizing the image features to obtain quantized image features, grouping the quantized image features on a spatial dimension by combining a block mask, wherein each group is a group feature called a group feature, entropy coding is respectively carried out on all the group features by combining the probability distribution of the whole body, all designated groups are selected according to task setting, and entropy coding streams corresponding to all the designated groups are combined to form texture parts of code streams; the code stream header information and the texture part of the code stream form a semantic structured code stream;

a decoding unit for performing a decoding section, the decoding section comprising: decoding code stream header information in the semantic structured code stream to obtain size information, quantized super-prior-check characteristics, a target detection result, block size information and a block mask of an input image; carrying out the inverse-check transformation on the quantized super-check characteristics to obtain the integral probability distribution; taking out the code stream corresponding to each appointed group from the texture part, carrying out entropy decoding by combining the probability distribution of the whole body to obtain the group characteristics corresponding to each appointed group, and recombining the group characteristics corresponding to all the appointed groups into recombined quantized image characteristics by combining a block mask; and combining the size information of the input image and the characteristics of the recombined quantized image, and obtaining a reconstructed image through the inverse transformation operation of the depth image decoder.

It will be clear to those skilled in the art that, for convenience and simplicity of description, the foregoing division of the functional modules is merely used as an example, and in practical applications, the above function distribution may be performed by different functional modules according to needs, that is, the internal structure of the system is divided into different functional modules to perform all or part of the above described functions.

EXAMPLE III

The present invention also provides a processing apparatus, as shown in fig. 8, which mainly includes: one or more processors; a memory for storing one or more programs; wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the methods provided by the foregoing embodiments.

Further, the processing device further comprises at least one input device and at least one output device; in the processing device, a processor, a memory, an input device and an output device are connected through a bus.

In the embodiment of the present invention, the specific types of the memory, the input device, and the output device are not limited; for example:

the input device can be a touch screen, an image acquisition device, a physical button or a mouse and the like;

the output device may be a display terminal;

the Memory may be a Random Access Memory (RAM) or a non-volatile Memory (non-volatile Memory), such as a disk Memory.

Example four

The present invention also provides a readable storage medium storing a computer program which, when executed by a processor, implements the method provided by the foregoing embodiments.

The readable storage medium in the embodiment of the present invention may be provided in the foregoing processing device as a computer readable storage medium, for example, as a memory in the processing device. The readable storage medium may be various media that can store program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a magnetic disk, or an optical disk.

The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A semantic structural image coding and decoding method based on block mask is characterized by comprising the following steps:

and a decoding part: decoding code stream header information in the semantic structured code stream to obtain size information, quantized super-prior-check characteristics, a target detection result, block size information and a block mask of an input image; carrying out inverse-check transformation on the quantized superior-check characteristics to obtain integral probability distribution; taking out the code stream corresponding to each appointed group from the texture part, carrying out entropy decoding by combining the probability distribution of the whole body to obtain the group characteristics corresponding to each appointed group, and recombining the group characteristics corresponding to all the appointed groups into recombined quantized image characteristics by combining a block mask; and combining the size information of the input image and the characteristics of the recombined quantized image, and obtaining a reconstructed image through the inverse transformation operation of the depth image decoder.

2. The method of claim 1, wherein the performing target detection on the input image, obtaining a target detection result and an instance segmentation result, and generating a block mask according to predefined block size information comprises:

recording the size of an input image as H multiplied by W multiplied by C, wherein H and W respectively represent the height and the width of the input image, and C is the number of channels; the predefined block size information is marked as B, which represents the side length of the image block, and the size of the image block is B multiplied by B;

combining the target detection result, the example segmentation result and the predefined block size information to generate a block mask m, wherein the size of the block mask m is

The value of each pixel of the gray scale image is an integer of 0-255, and the value of each pixel in the block mask represents the group to which the corresponding image block belongs.

3. The method according to claim 1 or 2, wherein the target detection result comprises: the number of target objects, the location and the category of each target object; the position of each target object includes: the horizontal axis position of the upper left corner coordinate of the target object, and the vertical axis position, height and width of the upper left corner coordinate.

4. The method according to claim 3, wherein the variable-rate semantic structural image is encoded and decoded,

respectively entropy coding the target detection results, namely respectively entropy coding the number of the target objects, the position and the category of each target object;

after entropy coding is carried out on each group of characteristics, all required appointed groups are selected according to different task settings, and are arranged into the code stream from small to large according to the numerical value of the index corresponding to the appointed groups, and the texture parts of the code stream are formed by combination.

5. The method according to claim 2, wherein the syntax structure of the semantic structured code stream comprises: the height of the input image, the width of the input image, the code stream length corresponding to the quantization super-prior-check feature, the block size information, the code stream length corresponding to the block mask, the number of target objects, the position and the category of each target object, and the code stream length corresponding to the texture part corresponding to all the specified groups.

6. A semantic structured image coding and decoding system based on block mask, which is realized based on the method of any one of claims 1 to 5, and comprises:

an encoding unit for performing an encoding section, the encoding section including: performing target detection on an input image to obtain a target detection result and an example segmentation result, generating a block mask by combining predefined block size information, and distinguishing the group to which a target object belongs through the block mask; the method comprises the steps of obtaining image characteristics of an input image by using the transformation operation of a depth image encoder, carrying out the super-first-check transformation and quantization on the image characteristics to obtain quantized super-first-check characteristics, carrying out the super-first-check inverse transformation on the quantized super-prior characteristics to obtain integral probability distribution, respectively carrying out entropy coding on size information, the quantized super-first-check characteristics, a target detection result, block size information and a block mask of the input image, and then splicing to obtain code stream header information; quantizing the image features to obtain quantized image features, grouping the quantized image features on a spatial dimension by combining a block mask, wherein each group is a group feature called a group feature, entropy coding is respectively carried out on all the group features by combining the probability distribution of the whole body, all designated groups are selected according to task setting, and entropy coding streams corresponding to all the designated groups are combined to form texture parts of code streams; the code stream header information and the texture part of the code stream form a semantic structured code stream;

7. A processing device, comprising: one or more processors; a memory for storing one or more programs;

wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1-5.

8. A readable storage medium, storing a computer program, characterized in that the computer program, when being executed by a processor, carries out the method according to any one of claims 1 to 5.