CN111491167A

CN111491167A - Image encoding method, transcoding method, device, equipment and storage medium

Info

Publication number: CN111491167A
Application number: CN202010221975.2A
Authority: CN
Inventors: 岳泊暄; 沈建强; 龚骏辉; 曾雁星
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2019-10-28
Filing date: 2020-03-26
Publication date: 2020-08-04
Anticipated expiration: 2040-03-26
Also published as: CN111491167B

Abstract

An image encoding method. The method divides the image into ROI and non-ROI, keeps the coding quality of the ROI unchanged, and actively reduces the coding quality of the non-ROI, so that different areas correspond to different coding qualities, and the data volume of the image is reduced on the premise of ensuring the image quality of the ROI.

Description

Image encoding method, transcoding method, device, equipment and storage medium

This application claims priority from a chinese patent application having application number 201911032040.3 entitled "image processing method" filed on 28/10/2019, the entire contents of which are incorporated herein by reference.

Technical Field

The present application relates to the field of image processing technologies, and in particular, to an image encoding method, a transcoding method, an apparatus, a device, and a storage medium.

Background

The intelligent security system can provide monitored images for users, and can comprise acquisition equipment, storage equipment and processing equipment. For any image collected by the collection device, the collection device can perform Joint Photographic Experts Group (JPEG) coding on the image to obtain a coded image of the image, and the storage device stores the coded image of the image, so that the storage device can store the coded image of the image collected by each collection device in the intelligent security system.

When the user needs to check the image acquired by the acquisition device, the processing device may acquire the encoded image of the image to be checked from the storage device, decode the encoded image of the image to be checked, and display the decoded image to the user.

The compression rate of JPEG encoding is low, and a large amount of redundant information is carried in an encoded image, so that the data volume of the encoded image is large, and when a large number of images are acquired by acquisition equipment, a storage device with a large storage space is required to store a large amount of encoded images, so that the cost of the storage device is increased.

Disclosure of Invention

The embodiment of the application provides an image encoding method, a transcoding method, a device, equipment and a storage medium, and can reduce the cost of storage equipment. The technical scheme is as follows:

in a first aspect, an image encoding method is provided, which includes:

firstly, obtaining an original image, then determining a region of interest (ROI) and a non-ROI) in the original image, and then coding the original image to obtain a coded image with the ROI code rate higher than that of the non-ROI;

and in the process of coding the original image, suppressing the high-frequency alternating current component in the original image within a first amplitude suppression range in a non-ROI, wherein the original image is obtained by image acquisition.

The method keeps the encoding quality of the ROI unchanged, and actively reduces the encoding quality of the non-ROI, so that different areas correspond to different encoding qualities, and the data volume of the image is reduced on the premise of ensuring the image quality of the ROI. Specifically, the method can reduce the data amount of the non-ROI encoded image in the original image by suppressing the high-frequency alternating current component in the first amplitude suppression range in the non-ROI in the original image, the data amount of the non-ROI encoded image is reduced because the encoded image of the original image is composed of the non-ROI encoded image and the ROI encoded image, the data amount of the non-ROI encoded image is reduced, the encoded image of the original image can be stored more when the storage device stores the encoded image of the original image, the cost of the storage device can be reduced, and the ROI code rate is higher than that of the non-ROI encoded image, so that the encoding quality of the ROI encoded image can be ensured, and the data amount of the encoded image of the original image can be reduced.

In one possible implementation, the original image includes a plurality of image regions, the image regions belonging to the non-ROI each correspond to a first level, the image regions belonging to the ROI have a plurality of levels, wherein each image region belonging to the ROI corresponds to one of the plurality of levels, the different levels correspond to different amplitude suppression ranges of the high-frequency alternating-current component, and the amplitude suppression range of the high-frequency alternating-current component corresponding to each of the plurality of levels is higher than the first amplitude suppression range;

the method further comprises the following steps:

and in the process of coding the original image, suppressing high-frequency alternating current components of a second amplitude suppression range in any image region belonging to the ROI, wherein the level corresponding to any image region corresponds to the second amplitude suppression range.

Based on the possible implementation manner, the image region belonging to the ROI is graded, and in the encoding process, the alternating current components with different amplitude suppression ranges are suppressed for the image region of the ROI according to the grade of the image region of the ROI in the original image, so that the encoding quality of the ROI in the original image can be ensured, the data volume of the encoded image of the original image can be reduced, and the occupied storage resources can be further reduced.

In one possible implementation, the image areas of non-critical portions of the target object correspond to the second level, and the image areas of critical portions of the target object correspond to the third level;

the amplitude suppression range of the high-frequency alternating-current component corresponding to the third level is higher than that of the high-frequency alternating-current component corresponding to the second level, and the ROI in the original image is an image area of a target object in the original image.

In one possible implementation, the image area expanded based on the image area corresponding to the third level corresponds to a fourth level, and the amplitude suppression range of the high-frequency alternating-current component corresponding to the fourth level is higher than the amplitude suppression range of the high-frequency alternating-current component corresponding to the third level.

Based on the above possible implementation manner, by increasing the level of the expanded image region corresponding to the third level, in the encoding process, so that the suppressed ac component in the expanded image region is higher than the suppressed ac component in the image region corresponding to the third level, the encoding quality of the expanded image region is higher than that of the image region corresponding to the third level, so as to influence the encoding of the image region corresponding to the third level, and thus the encoding quality of the image region corresponding to the third level can be increased.

In one possible implementation, the method further includes: and when any image area in the original image corresponds to the third grade, performing expansion operation on the image area to determine an expanded image area of the image area, wherein the expanded image area corresponds to the fourth grade.

In a possible implementation, the expanding any image region includes:

performing expansion operation on any image area to obtain a first expanded image area of the expanded any image area; performing expansion operation on the first expanded image area to obtain a second expanded image area of any expanded image area; determining the second dilated image area as a dilated image area of the any image area.

Based on the possible implementation manner, by the dilation operation, it is possible to avoid that the image area of the third level cannot completely cover the key part of the whole target object, and the encoding quality of the image area of the third level can be improved.

In one possible implementation, before encoding the original image, the method further includes:

and determining the corresponding grade of each image area of the original image.

In one possible implementation, the image area includes at least one image block;

the determining the corresponding grade of each image area of the original image comprises:

inputting the original image into a second model, and outputting a second label of each image block in the original image based on the original image by the second model; for any image area in the plurality of image areas, determining a grade corresponding to the image area according to a second label of the image block in the image area;

the second label of an image block is used for indicating the corresponding grade of the image block.

In a possible implementation manner, the determining, according to the second label of the image block in any image area, a level corresponding to any image area includes:

when any image area comprises a plurality of image blocks, determining the number of the image blocks of each grade in any image area; and determining the grade corresponding to the most image blocks as the grade corresponding to any image area.

when the any image area comprises an image block, determining the grade indicated by the second label of the image block in the any image area as the grade corresponding to the any image area.

and according to the second label of each image block in the original image, forming a target image area by a plurality of adjacent image blocks corresponding to the same grade, wherein the grade corresponding to the target image area is the grade corresponding to the image block in the target image area.

In one possible implementation, the training process of the second model includes:

firstly, inputting a plurality of historical images into an image semantic segmentation model, and outputting a plurality of first pixel labels by the image semantic segmentation model based on the plurality of historical images; inputting the plurality of historical images into an image recognition model, and outputting a plurality of second pixel labels by the image recognition model based on the plurality of historical images; then, training an initial model based on a first pixel point indicated by a first pixel label of each historical image and a second pixel point indicated by a second pixel label of each historical image to obtain a second model;

the image area of the key part of the target object is composed of a plurality of second pixel points.

In a possible implementation manner, the training the initial model based on a first pixel point indicated by a first pixel label of each historical image and a second pixel point indicated by a second pixel label of each historical image includes:

generating a training set based on a first pixel point indicated by a first pixel label of each historical image and a second pixel point indicated by a second pixel label of each historical image; inputting the training set into the initial model, and training the initial model based on the training set to obtain the second model;

wherein the training set comprises training labels for each historical image and for respective image blocks of each historical image.

In a possible implementation manner, the generating a training set based on a first pixel point indicated by a first pixel label of each historical image and a second pixel point indicated by a second pixel label of each historical image includes:

for any historical image in the plurality of historical images, carrying out blocking processing on any historical image to obtain a plurality of historical image blocks; for any historical image block in the plurality of historical image blocks, determining whether the any historical image block is an ROI or not according to the number of second pixel points, the number of third pixel points and the number of fourth pixel points in the any historical image block; determining the grade corresponding to any historical image block according to the number of second pixel points, the number of third pixel points and the number of fourth pixel points in any historical image block; determining a training label of any historical image block according to whether the any historical image block is an ROI and the grade corresponding to the any historical image block; forming the training set by the plurality of historical images and the training labels of the image blocks of each historical image;

the third pixel point is a first pixel point except a second pixel point in the historical image, the third pixel point is used for forming a non-key part of the target object, and the fourth pixel point is used for forming a non-ROI.

In a possible implementation manner, the training label of each historical image block carries a level corresponding to one historical image block, and the training label of each historical image block also carries a target identifier, where the target identifier indicates whether the image block is an ROI.

In a possible implementation manner, the determining whether any historical image block is an ROI according to the number of second pixel points, the number of third pixel points, and the number of fourth pixel points in any historical image block includes:

and when the total number of the second pixel points and the third pixel points in any historical image block is greater than the number of the fourth pixel points in any historical image block, determining any historical image block as an ROI, otherwise, determining any historical image block as a non-ROI.

In a possible implementation manner, the determining, according to the number of the second pixel points, the number of the third pixel points, and the number of the fourth pixel points in any historical image block, a level corresponding to the any historical image block includes:

when the total number of second pixel points and third pixel points in any historical image block is less than or equal to the number of fourth pixel points in any historical image block, determining the grade corresponding to any historical image block as the first grade;

and when the total number of second pixel points and third pixel points in any historical image block is greater than the number of fourth pixel points in any historical image block, and the number of third pixel points in any historical image block is greater than the number of second pixel points in any historical image block, determining the grade corresponding to any historical image block as the second grade, otherwise, determining the grade of any historical image block as the third grade.

In one possible implementation, the original image includes a plurality of image areas, each image area including at least one image block;

the determining of the region of interest, ROI, and the non-ROI in the original image comprises:

inputting the original image into a first model, and outputting a first label of each image block in the original image based on the original image by the first model; for any image area in a plurality of image areas of the original image, determining whether the any image area is an ROI according to a first label of an image block in the any image area;

wherein a first label of an image block is used to indicate whether the image block is an ROI image block.

In a possible implementation manner, the determining, according to the first label of the image block in any image region, whether any image region is an ROI includes:

when any image area comprises a plurality of image blocks, determining non-ROI image blocks and ROI image blocks in any image area according to the first labels of the image blocks in any image area;

and when the total size of the non-ROI image blocks in any image area is larger than that of the ROI image blocks in any image area, determining any image area of the image as a non-ROI, otherwise, determining any image area as an ROI.

when any image area comprises an image block, if the first label of the image block included in the image area indicates that the image block is a non-ROI image block, determining the image area as a non-ROI, otherwise, determining the image area as an ROI.

In one possible implementation, the determining the ROI and the non-ROI in the original image includes:

determining an image area formed by a plurality of adjacent ROI image blocks as an ROI according to the first label of each image block in the original image; and determining an image area formed by a plurality of adjacent non-ROI sub-blocks as a non-ROI according to the first label of each sub-block in the original image.

In a second aspect, there is provided an image encoding method, the method comprising:

firstly, acquiring an original image; determining the grade corresponding to each image area of the original image; then, the original image is coded to obtain a coded image with ROI code rate higher than non-ROI code rate,

the method comprises the steps of coding an original image according to the grade corresponding to each image area and the quantization step corresponding to the grade corresponding to each image area in the process of coding the original image, wherein the original image is obtained through image acquisition and comprises a plurality of image areas.

According to the method, the original image is encoded according to the grade corresponding to each image area and the quantization step corresponding to the grade corresponding to each image area, so that the compression rate of the original image during encoding can be reduced, the data volume of the encoded image is small, the storage device can store more encoded images, and the cost of the storage device can be reduced. If the HEIF encoding is performed, the compression rate achieved by HEIF encoding the original image is lower than the compression rate achieved by JPEG encoding the original image, and therefore, the data amount of the encoded image obtained by HEIF encoding the original image is relatively small, so that the storage device can store a large number of encoded images in the HEIF format, and the cost of the storage device can be reduced. Moreover, because different levels correspond to different quantization coefficients, the original image is subjected to the HEIF coding based on the levels corresponding to the image regions, so that the compression rate of the HEIF coding can be further reduced, the data size of the coded image in the HEIF format is further reduced, and the cost of the storage device is further reduced. And because the ROI code rate is higher than that of the coded image with the non-ROI code rate, the coding quality of the coded image with the ROI can be guaranteed, and the data volume of the coded image of the original image can be reduced.

In one possible implementation, image regions other than an image region of a target object correspond to a first level, image regions of non-critical portions of the target object correspond to a second level, and image regions of critical portions of the target object correspond to a third level;

the quantization step size corresponding to the first level is larger than the quantization step size corresponding to the second level, and the quantization step size corresponding to the second level is larger than the quantization step size corresponding to the third level.

In a possible implementation manner, the image area expanded based on the image area corresponding to the third level corresponds to a fourth level, and the quantization step size corresponding to the third level is larger than the quantization step size corresponding to the fourth level.

In one possible implementation, the method further includes:

and when any image area in the plurality of image areas corresponds to the third grade, performing expansion operation on the image area to determine an expanded image area of the image area, wherein the expanded image area corresponds to the fourth grade.

In a possible implementation, the expanding any image region includes:

In a possible implementation manner, the determining the level corresponding to each image region of the original image includes:

inputting a plurality of historical images into an image semantic segmentation model, and outputting a plurality of first pixel labels by the image semantic segmentation model based on the plurality of historical images; inputting the plurality of historical images into an image recognition model, and outputting a plurality of second pixel labels by the image recognition model based on the plurality of historical images; training an initial model based on a first pixel point indicated by a first pixel label of each historical image and a second pixel point indicated by a second pixel label of each historical image to obtain a second model;

when the total number of second pixel points and third pixel points in any historical image block is less than or equal to the number of fourth pixel points in any historical image block, determining the grade corresponding to any historical image block as the first grade; and when the total number of second pixel points and third pixel points in any historical image block is greater than the number of fourth pixel points in any historical image block, and the number of third pixel points in any historical image block is greater than the number of second pixel points in any historical image block, determining the grade corresponding to any historical image block as the second grade, otherwise, determining the grade of any historical image block as the third grade.

and for any image area in the original image, determining a quantization step corresponding to the level of the image area according to the basic quantization step of the original image, the distortion strength corresponding to at least one level and the quality enhancement coefficient corresponding to at least one level.

In a third aspect, an image transcoding method is provided, where the method includes:

firstly, decoding a coded image to obtain an original image; determining the grade corresponding to each image area of the original image; then, coding the original image to obtain a coded image with the ROI code rate higher than the non-ROI code rate; responding to an image viewing request, and decoding the coded image to obtain a decoded image; carrying out joint photographic experts group JPEG coding on the decoded image to obtain a target coded image; outputting the target coded image;

in the process of encoding the original image, the original image is encoded according to the level corresponding to each image area and the quantization step corresponding to the level corresponding to each image area, and the original image comprises a plurality of image areas.

In one possible implementation, the method further includes:

In a possible implementation, the expanding any image region includes:

In one possible implementation, each image area comprises at least one image block; the determining the corresponding grade of each image area of the original image comprises:

In a fourth aspect, there is provided an image encoding apparatus for performing the above-described image encoding method. Specifically, the image encoding apparatus includes a functional module configured to execute the image encoding method provided in the first aspect or any one of the optional aspects of the first aspect.

In a fifth aspect, an image encoding apparatus is provided for performing the above-described image encoding method. Specifically, the image encoding apparatus includes a functional module for executing the image encoding method provided by the second aspect or any one of the alternatives of the second aspect.

In a sixth aspect, an image transcoding device is provided, which is configured to execute the image transcoding method. Specifically, the image transcoding device includes a functional module for executing the image encoding method provided in the third aspect or any optional manner of the third aspect.

In a seventh aspect, a capture device is provided, which includes a processor and a memory, where the memory stores at least one instruction, and the instruction is loaded and executed by the processor to implement the operations performed by the image coding method.

In an eighth aspect, a storage device is provided, which includes a processor and a memory, where at least one instruction is stored in the memory, and the instruction is loaded and executed by the processor to implement the operations performed by the image transcoding method as described above.

In a ninth aspect, a computer-readable storage medium is provided, in which at least one instruction is stored, the instruction being loaded and executed by a processor to implement the operations performed by the methods as described above.

In a tenth aspect, a chip is provided, which includes a processor and a memory, where at least one instruction is stored in the memory, and the instruction is loaded and executed by the processor to implement the operations performed by the methods as described above.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic diagram of an original image provided in an embodiment of the present application;

FIG. 2 is a flow chart of model training provided by an embodiment of the present application;

FIG. 3 is a flow chart of another model training provided by embodiments of the present application;

fig. 4 is a schematic diagram of an image transcoding system provided in an embodiment of the present application;

fig. 5 is a schematic structural diagram of an acquisition device provided in an embodiment of the present application;

FIG. 6 is a schematic structural diagram of a storage device according to an embodiment of the present disclosure;

fig. 7 is a flowchart of an image encoding method according to an embodiment of the present application;

fig. 8 is a flowchart of an image encoding method according to an embodiment of the present application;

FIG. 9 is a flow chart of JPEG encoding provided by an embodiment of the present application;

FIG. 10 is a flowchart of another image encoding method provided in the embodiments of the present application;

fig. 11 is a flowchart of a transcoding method provided in an embodiment of the present application;

fig. 12 is a flowchart of another image transcoding method provided in an embodiment of the present application;

fig. 13 is a schematic structural diagram of a chip provided in an embodiment of the present application;

FIG. 14 is a schematic diagram of a computer-based force expansion card provided by an embodiment of the present application;

fig. 15 is a schematic structural diagram of an image encoding apparatus according to an embodiment of the present application;

fig. 16 is a schematic structural diagram of an image encoding apparatus according to an embodiment of the present application;

fig. 17 is a schematic structural diagram of an image transcoding device according to an embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

The following description is made for the original image:

the original image may include a region of interest (ROI), which is an image region of a target object in the original image, the target object being an object of interest of a user, the target object being a person or a vehicle, and a non-ROI, which is an image region of interest of the user. The non-ROI is an image region in the original image except for the target object, i.e. a background region, which is a region of the original image that is not of much interest to the user. For example, fig. 1 is a schematic diagram of an original image, the original image includes a person, a vehicle, and a surrounding environment, in a security system, a user generally pays more attention to features of the person and the vehicle, and the attention to the surrounding environment except for the person and the vehicle is not so high, then the encoding device may use the person and the vehicle in the original image as target objects, image regions of the person and the vehicle are ROIs, and other image regions except for the person and the vehicle in the original image are non-ROIs in the original image. The original image in the embodiment of the present invention refers to an image that has not been encoded yet. The original image may be an image directly generated by the sensor, or an image obtained by processing an image directly generated by the sensor by an image processor (ISP). The format of the original image is, for example, RAW format, luminance bandwidth chrominance (YUV) format, or Red Green Blue (RGB) format.

Specifically, the image directly generated by the sensor is in a RAW format, and the format of the RAW image is different according to the design of the sensor, for example, bayer RGGB, RYYB, RCCC, RCCB, RGBW, CMYW and other formats.

The original image is not image coded. In the embodiment of the present invention, the image format generated after encoding is, for example, bmp, jpg, png, tif, gif, pcx, tga, exif, fpx, svg, psd, cdr, pcd, dxf, ufo, eps, ai, raw, WMF, webp, or the like. Encoding is also sometimes referred to as compression, since it reduces the amount of data in an image.

The original image may include a plurality of image areas, each of which may be irregular in shape, each of which may be an image block of a certain size, or an image area composed of a plurality of image block groups, each of which may also be irregular in shape, and each of which may be the same or different in size.

It is contemplated that in some specific application scenarios, non-critical portions of the target object may be of greater interest to the user, while critical portions of the target object may be of greatest interest to the user. Still taking fig. 1 as an example, in the security system, if the target object is a person, the face of the person is a key part of the person, and the body of the person (called the person for short) is a non-key part of the person, and if the target object is a vehicle, the license plate is a key part of the vehicle, and the body of the vehicle is a non-key part of the vehicle. In consideration of the fact that the attention degree of the user to different parts of the target object is different, the encoding apparatus may set corresponding levels for respective image regions in the original image so that the encoding apparatus performs different levels of encoding on the respective image regions based on the corresponding levels of the respective image regions. In a possible implementation, the image regions belonging to the non-ROI each correspond to a first level, the image regions belonging to the ROI having a plurality of levels, wherein each image region belonging to the ROI corresponds to one of the plurality of levels. In one possible implementation, the image areas of non-critical portions of the target object correspond to the second level and the image areas of critical portions of the target object correspond to the third level. Considering that an image region around an image region may be affected during encoding of the image region, the encoding apparatus may set a higher level to a region around a key portion of a target object so that encoding quality of the key portion of the target object may be ensured when encoding the image region around the key portion of the target object according to the level. In one possible implementation, the image region expanded based on the image region corresponding to the third level corresponds to a fourth level. Still taking fig. 1 as an example, the background region in fig. 1 corresponds to a first level, the image region of the human body and the image region of the vehicle body both correspond to a second level, and the image region of the human face and the image region of the license plate both correspond to a third level.

The identification models that may be used in the present application are introduced as follows:

in the present application, it is referred to identifying ROIs in original images using an Artificial Intelligence (AI) model, which may be a deep neural network model. The application provides two AI models capable of identifying the ROI in the original image, namely a first model and a second model, wherein the first model can identify a single-level ROI in the image, namely the first model can identify a non-ROI and the ROI in the image. The second model may identify multiple levels of ROI in the original image, i.e. the second model may identify different levels of image regions in the image.

Wherein, the training process of the first model can be realized by the following steps 101-103.

101. The training equipment carries out blocking processing on the plurality of historical images to obtain a plurality of historical image blocks of each historical image.

The plurality of historical images may be images captured in a particular application scenario, for example, images captured in a security system. The training device may slice each historical image into a plurality of image patches of the same size, e.g., the training device slices each historical image into a plurality of image patches of 16 × 16 in size.

102. The training equipment labels each image block to obtain a target training label of each image block, wherein the target training label of one image block is used for indicating whether the image block is an ROI or not.

The training equipment can label each image block based on user requirements, so that each image block corresponds to one target training label. The target training label of one image block can carry a target identifier, the target identifier is used for indicating whether the image block is an ROI, when the target identifier is the ROI identifier, the image block is the ROI, and when the target identifier is a non-ROI identifier, the image block is the non-ROI.

The user requirement may include an ROI in the historical image, such that the training device may label image blocks in the historical image that are used to compose the ROI of the user requirement as the ROI, and image blocks in the historical image that are used to compose regions other than the ROI of the user requirement as the non-ROI.

Of course, the training device may also label the target training labels not by user requirements, but by manually labeling the target training labels of the historical images, and the process of labeling the target training labels by the training device is not specifically limited in the embodiment of the present application.

103. The training equipment inputs a plurality of historical images and target training labels of all image blocks of each historical image into a first initial model, and the first initial model is trained on the basis of the input plurality of historical images and the target training labels of all image blocks of each historical image to obtain the first model.

In each training process, the first initial model carries out image recognition on each image block of each historical image based on current model parameters, a label of each image block of each historical image is output, the output label of each image block is compared with a target training label of each image block to determine the accuracy of the image recognition of the first initial model based on the current model parameters, if the accuracy reaches the target accuracy, the training equipment uses the first initial model of the current model parameters as the first model, otherwise, the training equipment modifies the current model parameters of the first initial model and carries out the next training.

The training process of the second model may be a flowchart of model training provided in the embodiment of the present application shown in fig. 2, and the model training process may include the following

steps

201 and 203.

201. The training equipment inputs a plurality of historical images into an image semantic segmentation model, the image semantic segmentation model outputs a plurality of first pixel labels based on the plurality of historical images, one first pixel label is used for indicating one first pixel point, and an image area of a target object is composed of a plurality of first pixel points.

Through the first pixel label output by the image semantic segmentation model, the training device can determine which pixel points in each historical image are the first pixel points. Before the training device inputs the plurality of historical images into the image semantic segmentation model, the training device may train the initial image semantic segmentation model based on the images in the public image training set to obtain the image semantic segmentation model. The training process of the image semantic segmentation model can be implemented by the following process shown in step 2011-2012.

In step 2011, the training device generates a plurality of first training pixel labels for each image based on the plurality of images, and a first pixel point indicated by the first training pixel label is a pixel point concerned by the user in the image.

Wherein the plurality of images may be from a public training set of images, each image may include a plurality of pixel points. The training equipment can label the first pixel points in each image based on user requirements, so that each first pixel point corresponds to one first training pixel label. The user requirement may be a target object, so that the training device may label pixel points constituting the target object in the image as first pixel points, thereby enabling each first pixel point to correspond to one first training pixel label. Of course, the training device may also label the first training pixel label of each image manually instead of labeling the first training pixel label according to the user requirement, and the process of labeling the first training pixel label by the training device is not specifically limited in the embodiment of the present application.

Step 2012, the training device inputs the multiple images and each first training pixel label of each image into an initial image semantic segmentation model, and the initial image semantic segmentation model performs training based on the input multiple images and each first training pixel label of each image to obtain the image semantic segmentation model.

In each training process, the initial image semantic segmentation model performs image recognition on pixel points in each image based on current model parameters, labels of a plurality of pixel points in each image are output, the labels of the plurality of pixel points in each output image are compared with each first training pixel label of each image to determine the recognition rate of the initial image semantic segmentation model for recognizing the first pixel points based on the current model parameters, if the recognition rate reaches a target recognition rate, the training equipment takes the initial image semantic segmentation model using the current model parameters as the image semantic segmentation model, otherwise, the training equipment modifies the current model parameters of the initial image semantic segmentation model and performs next training.

202. The training equipment inputs the plurality of historical images into an image recognition model, the image recognition model outputs a plurality of second pixel labels based on the plurality of historical images, one second pixel label is used for indicating one second pixel point, and an image area of a key part of the target object is composed of the plurality of second pixel points.

The image recognition model is used to identify second pixel points that constitute a key portion of the target object. For example, the image recognition model may be a face recognition model, and is used for recognizing a second pixel point constituting a face in the image. Through the second pixel label of each historical image output by the image recognition model, the training device can determine which pixel points in each historical image are the second pixel points.

It should be noted that, when the second pixel points in the plurality of historical images are used to form different types of target objects, the training device may input the plurality of historical images into a plurality of image recognition models with different recognition functions, respectively, to recognize the pixel points forming the different types of target objects. For example, the training device may input the plurality of historical images into the face recognition model and the license plate recognition model, respectively, to recognize the second pixel points constituting the face and the second pixel points constituting the license plate.

203. The training equipment trains the initial model based on the first pixel points indicated by the first pixel labels of the historical images and the second pixel points indicated by the second pixel labels of the historical images to obtain the second model.

In order to train the second model capable of identifying the level corresponding to the image block, the training device may first generate a training label at the image block level of each historical image based on the classification of the pixel points in each historical image, and then train the second model based on the training label at the image block level of each historical image and a plurality of historical images.

Step 2031, the training device generates a training set based on the first pixel point indicated by the first pixel label of each historical image and the second pixel point indicated by the second pixel label of each historical image, where the training set includes each historical image and the training labels of the image blocks of each historical image.

The training label of each historical image block carries a grade corresponding to the historical image block, and the training label of each historical image block also carries a target identifier. The training device can divide each historical image into a plurality of image blocks, determine whether each image block is an ROI and the grade of each image block based on pixel points in each image block, and generate the training set based on the grade of each image block. In one possible implementation, the process shown in step 2031 can be implemented by steps 1-5 described below.

Step 1, for any historical image in a plurality of historical images, the training equipment carries out blocking processing on any historical image to obtain a plurality of historical image blocks.

The process shown in step 1 is the same as the process shown in step 101, and here, the embodiment of the present application does not repeat this step 1.

And 2, for any historical image block in the plurality of historical image blocks, the training equipment determines whether the any historical image block is an ROI or not according to the number of second pixel points, the number of third pixel points and the number of fourth pixel points in the any historical image block, wherein the third pixel points are first pixel points except the second pixel points, and the fourth pixel points are pixel points for forming a non-ROI.

The third pixel point is used for forming a non-key part of the target object. After the training device obtains the first pixel label and the second pixel label of each historical image, if a pixel point indicated by one first pixel label is the same as a pixel point indicated by one second pixel label, it is indicated that the first pixel point indicated by the first pixel label is the second pixel point, the training device can replace the first pixel label with the second pixel label, so that the training device can obtain a pixel label set, the pixel label set comprises the second pixel label of each historical image and a part of the first pixel labels of each historical image, wherein each second pixel label in the pixel label set corresponds to one second pixel point, and each first pixel label in the pixel label set corresponds to one third pixel point. And the fourth pixel point is also the pixel point without the pixel label, and can form a non-ROI.

The total number of the second pixel points and the third pixel points in any historical image block is also the number of the first pixel points in any historical image block, the number of the fourth pixel points in any historical image block is also the number of the pixel points forming the non-ROI in any historical image block, so when the total number of the second pixel points and the third pixel points in any historical image block is larger than the number of the fourth pixel points in any historical image block, the training equipment can determine any historical image block as the ROI, and otherwise, the training equipment can determine any historical image block as the non-ROI.

And 3, determining the corresponding grade of any historical image block by the training equipment according to the number of the second pixel points, the number of the third pixel points and the number of the fourth pixel points in any historical image block.

When the total number of the second pixel points and the third pixel points in any historical image block is less than or equal to the number of the fourth pixel points in any historical image block, it is indicated that any historical image block is a non-ROI, and the training device may determine the level corresponding to any historical image block as a first level.

When the total number of the second pixel points and the third pixel points in any historical image block is greater than the number of the fourth pixel points in any historical image block, and the number of the third pixel points in any historical image block is greater than the number of the second pixel points in any historical image block, it is indicated that any historical image block is a ROI, and the size of a non-key part of a target object in any historical image block is greater than the size of a key part, so that the training device can determine the grade corresponding to any historical image block as a second grade. When the total number of the second pixel points and the third pixel points in any historical image block is larger than the number of the fourth pixel points in any historical image block, and the number of the third pixel points in any historical image block is smaller than or equal to the number of the second pixel points in any historical image block, it is indicated that any historical image block is a ROI, and the size of a key part of a target object in any historical image block is larger than that of a non-key part, so that the training equipment can determine the grade corresponding to any historical image block as a third grade. Wherein the third level is greater than the second level, which is greater than the first level.

And 4, the training equipment determines a training label of any historical image block according to whether any historical image block is the ROI and the corresponding grade of any historical image block.

The training label of any historical image block can carry the corresponding grade of any historical image block, and the training label of any historical image block can also carry a target identifier. When any historical image block is an ROI, the training label of any historical image block can comprise an ROI identifier and a grade corresponding to any historical image block; and when any historical image block is a non-ROI, the training label of any historical image block comprises a non-ROI identification and a grade corresponding to any historical image block.

And 5, the training equipment combines the plurality of historical images and the training labels of the image blocks of each historical image into the training set.

The training device may perform the processes of steps 1 to 5 on each historical image, so that each historical image block of each historical image corresponds to one training label, and the training device may combine each historical image and the training labels of each historical image into the training set.

Step 2032, the training device inputs the training set into the initial model, and the initial model is trained based on the training set to obtain the second model.

In each training process, the initial model performs image recognition on each image block of each historical image based on current model parameters, a label of each historical image block of each historical image is output, the training equipment compares the output label of each historical image block with the training label of each historical image block, for any historical image block, if the output label content of any historical image block is consistent with the training label content of any historical image block, the recognition of any historical image block is considered to be correct, so that the accuracy of the image recognition performed by the initial model based on the current model parameters can be determined, if the accuracy reaches a first target accuracy, the training equipment uses the initial model of the current model parameters as a second model, otherwise, the training equipment modifies the current model parameters of the initial model, and performing the next training.

For further explanation of the process of training the second model by the training apparatus, refer to another model training flowchart provided in the embodiment of the present application and shown in fig. 3. The training device may label a first training pixel label to a pixel point of an image in a public data set of image semantic segmentation to obtain a two-class data set including a plurality of images and the first training pixel label of each image (see step 2011); the training equipment designs the first deep neural network into an initial image semantic segmentation model, and trains the initial image semantic segmentation model based on a data set of two classifications to obtain an image semantic segmentation model (see step 2012); the training equipment respectively inputs a plurality of historical images into an image semantic segmentation model, a face recognition model and a license plate recognition model, the image semantic model outputs a first pixel label of a first pixel point of each historical image, the face recognition model outputs a second pixel label of a second pixel point of each historical image (namely, the second pixel label of the pixel point forming the face), and the license plate recognition model outputs a second pixel label of the second pixel point of each historical image (namely, the second pixel label of the pixel point forming the license plate) to obtain a pixel point set of the plurality of historical images, wherein the face recognition model and the vehicle recognition model are both image recognition models; the training device normalizes the plurality of historical images into a data set with a granularity of 16 × 16 based on the distribution of second pixel points, third pixel points and fourth pixel points in the plurality of historical images (see step 2031), wherein the data set with the granularity of 16 × 16 includes a plurality of image blocks with the size of 16 × 16 and a training label of each image block, that is, a training set; the training device designs the second deep neural network as an initial model, and performs model training on the initial model based on the 16 × 16-granularity data set to obtain a second model (see step 2032). After the training device obtains the second model, the second model needs to be tested based on hardware, so as to ensure that the second model can achieve the expected recognition effect. The process of testing the second model by the training device based on hardware may be: the training equipment performs performance test on the second model, if the performance of the second model reaches preset performance, the training equipment can verify the precision of the second model based on an open source artificial neural network library (keras), namely keras model precision verification, and if the second model passes the verification, the training equipment can convert the second model into a convolutional structure (fast feature embedding, Caffe) model based on a training frame (such as Tensoflow, pytorch), namely Caffe model conversion, and perform precision verification on the converted Caffe model, namely Caffe model precision verification, and evaluate the loss of the model conversion; when the Caffe model passes the verification, the training equipment converts the Caffe model into a Neural Network Inference Engine (NNIE) model, performs precision verification on the NNIE model, namely NNIE model precision verification, evaluates the loss of model conversion, tests the NNIE model on a board if the NNIE model passes the verification, performs precision verification on the board model, namely on-board model precision verification, outputs the on-board model if the error between the output result of the on-board model and the output result of the second model realized by software is within a certain range, and takes the on-board model as the finally used second model, namely final model solidification.

Fig. 4 is a schematic diagram of an image transcoding system provided in an embodiment of the present application, and referring to fig. 4, the image transcoding system 400 may include an acquisition device 401, a storage device 402, and a processing device 403, where the acquisition device is a front-end device in the image transcoding system 400, and the storage device 402 and the processing device 403 are both back-end devices in the image transcoding system 400.

The acquisition device 401 is configured to acquire an image, encode the acquired original image, and send the encoded image obtained by encoding to the storage device. The capture device 401 may determine whether each image region in the captured original image is an ROI based on the first model, and may suppress high-frequency ac components in non-ROIs in the original image to reduce the data amount of a non-ROI coded image in the original image when the original image is coded, so that the data amount of a coded image of the original image may be reduced. The acquisition device 401 may also determine the corresponding grade of each image area in the acquired original image based on the second model. In the process of encoding the original image, the acquisition device 401 may suppress the high-frequency alternating-current component in the amplitude suppression range corresponding to each image region level in each image region. The acquisition device 401 may further perform high efficiency image file format (HEIF) encoding of different encoding levels on each image region according to the level corresponding to each image region, and since the compression rate of HEIF encoding is higher than that of JEPE encoding, the data size of the encoded image encoded by HEIF is relatively small. Since the acquisition device 401 may encode the acquired original image, the acquisition device 401 may also be regarded as an encoding device, and since the acquisition device 401 may use the first model or the second model, the acquisition device 401 may train the first model and the second model by itself, and at this time, the acquisition device 401 is also a training device, and of course, if the acquisition device 401 does not train the first model and the second model, but acquires the first model and the second model from the training device. The capture device 401 may be a camera or other devices capable of capturing images, such as a Software Defined Camera (SDC) and an Internet Protocol Camera (IPC).

The storage device 402 is used for receiving the coded image sent by the acquisition device 401 and storing the coded image. In a possible implementation manner, when the format of the encoded image sent by the acquisition device 401 is an HEIF format, and the image format required by the processing device 403 is a Joint Photographic Experts Group (JPEG) format, the storage device 402 is further configured to decode the encoded image in the HEIF format to obtain an original image, perform JPEG encoding on the original image, and send the encoded image after JPEG encoding to the processing device 403. When the format of the encoded image transmitted by the capturing device 401 is the HEIF format and the image format required by the processing device 403 is the HEIF format, the storage device 402 may directly transmit the encoded image transmitted by the capturing device 401 to the processing device 403. In a possible implementation manner, when the encoded image stored in the storage device 402 does not reach the target compression rate, the storage device 402 is further configured to decode the encoded image to obtain an original image, determine whether each image region in the original image is a ROI based on the first model, and re-encode the original image, and during the encoding process of the original image, the high-frequency alternating current component in the original image that is not in the ROI may be suppressed. The storage device 402 is also used to determine the rank of each image area in the original image based on the second model. In encoding the original image, the storage device 402 may suppress the high-frequency ac component in the amplitude suppression range corresponding to each image region level in each image region. The storage device 402 may also perform HEIF coding at different coding levels for each image region according to the corresponding level of each image region. Since the storage device 402 can encode the original image, the storage device 402 can also be considered as an encoding device. The storage device 402 may train the first model and the second model by itself, and the storage device 402 is also a training device at this time, and of course, if the storage device 402 does not train the first model and the second model, the first model and the second model may be obtained from the training device. The storage device 402 may be any storage device capable of storing encoded images, such as a Video Cloud Node (VCN).

The processing device 403 is configured to receive the encoded image sent by the storage device 402, decode the encoded image to obtain a decoded image, and display the decoded image. The processing device 403 is also configured to send the decoded image to a user device, which displays the decoded image. The storage device 402 may be any decoding device capable of decoding encoded images, for example, a device in which a video content management system (VCM) is installed.

Fig. 5 is a schematic structural diagram of an acquisition device provided in this embodiment of the present application, where the acquisition device 500 may have a relatively large difference due to different configurations or performances, and may include one or more processors 501 and one or more memories 502, where the memory 502 stores at least one instruction, and the at least one instruction is loaded and executed by the processor 501 to implement the steps performed by the acquisition device in the methods provided in the following method embodiments. The acquisition device 500 may further comprise one or more acquisition units 503, the acquisition units 503 being adapted to acquire images. Of course, the acquisition device 500 may further have components such as a wired or wireless network interface, a keyboard, and an input/output interface, so as to perform input and output, and the acquisition device 500 may further include other components for implementing device functions, which are not described herein again.

Fig. 6 is a schematic structural diagram of a storage device according to an embodiment of the present application, where the storage device 600 may have a relatively large difference due to different configurations or performances, and may include one or more processors 601 and one or more memories 602, where the memory 602 stores at least one instruction, and the at least one instruction is loaded and executed by the processor 601 to implement the steps performed by the storage device in the methods according to the following method embodiments. Certainly, the storage device 600 may further have components such as a wired or wireless network interface, a keyboard, and an input/output interface, so as to perform input and output, and the storage device 600 may further include other components for implementing device functions, which are not described herein again.

In an exemplary embodiment, a computer-readable storage medium, such as a memory, including instructions executable by a processor in a terminal to perform the methods provided by the embodiments described below, is also provided. For example, the computer-readable storage medium may be a read-only memory (ROM), a Random Access Memory (RAM), a compact disc-read-only memory (CD-ROM), a magnetic tape, a floppy disk, an optical data storage device, and the like.

For further explanation of the process, refer to a flowchart of an image encoding method provided in an embodiment of the present application and shown in fig. 7.

701. The acquisition device acquires an original image.

The original image can be any image acquired by the acquisition equipment, and when the data format of the original image is the YUV format, the acquisition equipment can also preprocess the original image to obtain the target image data of the original image. The pre-set processing may include color mode conversion and or down-sampling.

In one possible implementation, the acquisition device may convert raw image data of a raw image in YUV format to target image data in RGB format. In another possible implementation manner, after the original image data of the original image in the YUV format is converted into the image data in the RGB format, the acquisition device may further perform down-sampling on the image data in the RGB format to obtain the target image data. If the data format of the original image data is RGB format, the acquisition device may not perform color mode conversion on the original image data, but directly perform down-sampling on the original image data.

702. The acquisition equipment carries out blocking processing on the original image to obtain a plurality of image areas.

The acquisition device can equally divide the original image based on the target image data of the original image to obtain a plurality of image areas with the same size. For example, the target image data includes 64 × 64 pixels, the collecting device may divide the 64 × 64 pixels into 2 groups of 32 × 32 pixels, and each group of 32 × 32 pixels may form an image region.

Of course, the acquisition device may also divide the original image unequally to obtain a plurality of image areas with different sizes. It should be noted that, in the embodiment of the present application, a manner of acquiring the plurality of image areas by the acquisition device is not specifically limited. The process shown in step 702 is also the process of blocking the original image by the acquisition device.

703. The acquisition device inputs the original image into a first model, and outputs first labels of image blocks in the original image based on the original image by the first model, wherein the first label of one image block is used for indicating whether the image block is an ROI image block.

The first label of an image block may carry an object identifier. The acquisition device may output the target image data of the original image to the first model, and the first model outputs the first labels of the respective image blocks in the processed image based on the input target image data. As for the first model, the description is given above, and the description of the first model is omitted in this embodiment.

704. For any image area in the original image, the acquisition device determines whether the any image area is an ROI according to the first label of the image block in the any image area.

In a possible implementation, when the any image region includes an image block, if the first label of the image block included in the any image region indicates that the any image block is a non-ROI image block, the acquisition device determines the any image region as a non-ROI, otherwise, determines the any image region as a ROI.

When the arbitrary image region includes a plurality of image blocks, the acquisition device may first determine whether the arbitrary image region is the ROI based on the non-ROI block and the ROI image block in the arbitrary image region. In one possible implementation, this step 704 may be implemented by the process shown in steps 7041-7042 described below.

Step 7041, the acquiring device determines a non-ROI image block and a ROI image block in the any image region according to the first label of each image block in the any image region.

For any image block, if the target identifier carried by the first tag of the image block is a non-ROI identifier, the image block is a non-ROI image block, and if the target identifier carried by the first tag of the image block is a ROI identifier, the image block is an ROI image block.

The acquisition device can also perform expansion operation on the ROI image block to obtain a first image area, and the first image area is used as a new ROI image block, wherein the first image area comprises the ROI image block and an image area expanded by the ROI image block. For example, if one ROI image block cannot cover the whole person, the acquiring device may perform an expansion operation on the ROI image block to obtain a first image region capable of covering the whole person, so as to prevent the ROI image block from not covering the whole target object completely, thereby improving the coverage of the target object.

Step 7042, when the total size of the non-ROI image blocks in the any image region is larger than the total size of the ROI image blocks in the any image region, the acquiring device determines the any image region as a non-ROI, otherwise, determines the any image region as a ROI.

When the total size of the non-ROI image blocks in any image region is larger than the total size of the ROI image blocks in any image region, it is indicated that the area of the background region in any image region is larger, the acquisition device can determine any image region as the non-ROI, otherwise, it is indicated that the area of the image region of the target object in any image region is larger, and the acquisition device can determine any image region as the ROI.

The total size of the non-ROI image blocks in any image region is a product of the number of the non-ROI image blocks in any image region and the size of a single image block, and if the sizes of the respective image blocks are the same, the acquisition device may also determine any image region as a non-ROI when the number of the non-ROI image blocks in any image region is greater than the number of the ROI image blocks in any image region, otherwise, determine any image region as a ROI.

In a possible implementation manner, the size of any image region may also be smaller than the size of a single image block, and for this case, if the image block in which the any image region is located is the ROI, the any image region is the ROI, and if the image block in which the any image region is located is the non-ROI, the any image region is the non-ROI.

The acquisition equipment can also determine an image area formed by a plurality of adjacent ROI image blocks as an ROI according to the first label of each image block in the original image; and determining an image area formed by a plurality of adjacent non-ROI sub-blocks as a non-ROI according to the first label of each sub-block in the original image so as to determine the ROI and the non-ROI in the original image.

It should be noted that the process shown in step 703 and 704 is also a process in which the acquisition device determines the ROI and the non-ROI in the original image. In addition, the acquisition device may determine the ROI and the non-ROI in the original image based on the second model in addition to the first model. In one possible implementation, the acquisition of the ROI and non-ROI in the original image of the device may also be performed by the process shown in steps A-B described below.

Step A, the acquisition equipment inputs the original image into a second model, the second model outputs a second label of each image block in the original image based on the original image, and the second label of one image block is used for indicating the corresponding grade of the image block.

The second label of one image block may only carry the level corresponding to the image block, since the image regions belonging to the non-ROI all correspond to the first level, and the levels of the image regions belonging to the ROI are the second level and the third level, respectively, if the level indicated by the second label of one image block is the first level, the image block is a non-ROI image block, and if the level indicated by the second label of one image block is the second level or the third level, the image block is an ROI image block. In a possible implementation manner, the second label of an image block may further carry a target identifier, and the target identifier indicates whether the image block is the ROI.

The acquisition device may input target image data of the original image into the second model, and output, by the second model, the second labels of the respective image blocks in the processed image based on the input target image data. The second model is described above, and the second model is not described in detail in the embodiments of the present application.

It should be noted that, if an image block corresponds to the second level, the acquisition device may further perform an expansion operation on the image block to obtain a first image region, where the first image region corresponds to the second level, and the process is the same as the process of performing the expansion operation on the ROI image block in step 704.

If an image block corresponds to the third level, the image block may be an image region of a target object, and the acquisition device may further perform an expansion operation on the image block to obtain a second image region, where the second image region includes the image block and an expanded image region of the image block, and the second image region corresponds to the third level, so that the second image region may include pixels on the periphery of the image block, which are used to form a key portion of the target object, and thus the second image region may include all pixels forming the key portion of the target object, so as to ensure the accuracy of the image block corresponding to the third level, and also improve the coverage of the key portion of the target object. For example, the image block 1 in fig. 1 cannot cover the whole face, and the acquisition device may perform a first expansion operation on the image block 1 to obtain a second image area capable of covering the whole face.

The acquisition device may perform expansion processing on the second image region obtained by expanding the image block to obtain an expanded image region of the second image region, where the expanded image region of the second image region may correspond to a fourth level.

And step B, for any image area in the original image, the acquisition equipment determines whether the any image area is the ROI or not according to the second label of each image block in the any image area.

When the second label carries the target identifier, the acquiring device may determine whether any image region is the ROI according to the target identifier carried by the second label of each image block in the any image region. The process of determining whether any image area is the ROI by the collecting device according to the target identifier carried by the second tag of each image block in any image area is the same as the process of determining whether any image area is the ROI by the collecting device according to the target identifier carried by the first tag of each image block in any image area in step 704, which is not described in detail in this embodiment of the present application.

When the second label does not carry the target identifier, the acquisition device may determine whether any image region is the ROI according to the level indicated by the second label of each image block in the any image region. In a possible implementation manner, when the any image area includes a plurality of image blocks, the acquisition device may determine the number of ROI image blocks and the number of non-ROI image blocks in the any image area according to the level indicated by the second label of each image block in the any image area; and when the number of the ROI image blocks in any image area is larger than that of the non-ROI image blocks, determining that the image area is the ROI, otherwise, determining that the image area is the non-ROI. When the any image area is an image block or the size of the any image area is smaller than that of a single image block, if the level indicated by the second label of the image block where the any image area is located is the first level, the any image area is a non-ROI, and if the level indicated by the second label of the image block where the any image area is located is the second level or the third level, the any image area is a ROI.

705. The acquisition device determines the rank of each image region in the original image.

The different levels correspond to different amplitude suppression ranges of the high frequency alternating current component, the image regions belonging to the non-ROI all correspond to a first level, and the first level corresponds to a first amplitude suppression range. The image region belonging to the ROI has a plurality of levels, and the amplitude suppression range of the high-frequency alternating-current component corresponding to each of the plurality of levels is higher than the first amplitude suppression range, that is, where the amplitude suppression range 1 is higher than the amplitude suppression range 2, it can be represented that the minimum amplitude in the amplitude suppression range 1 is larger than the minimum amplitude in the amplitude suppression range, for example, the amplitude suppression range 2 is [11, + ∞ ], and the amplitude suppression range 1 is [10, + ∞).

The first level corresponds to an image region of a non-ROI, the second level and the third level correspond to an image region of an ROI, and an amplitude suppression range of a high-frequency alternating-current component corresponding to the third level is higher than an amplitude suppression range of a high-frequency alternating-current component corresponding to the second level. In one possible implementation, the image region of the ROI also corresponds to a fourth level, and the amplitude suppression range of the high-frequency alternating-current component corresponding to the fourth level is higher than the amplitude suppression range of the high-frequency alternating-current component corresponding to the third level. For example, the first level corresponds to a first amplitude suppression range of [10, + ∞), the second level corresponds to an amplitude suppression range of [11, + ∞ ], the third level corresponds to an amplitude suppression range of [12, + ∞ ], the fourth level corresponds to an amplitude suppression range of [13, + ∞), [13, + ∞) above [12, + ∞), [12, + ∞) above [11, + ∞), [11, + ∞) above [10, + ∞).

The acquisition device may determine a level corresponding to the any image area according to the second label of the image block in the any image area. In a possible implementation manner, when the any image area includes a plurality of image blocks, the acquisition device determines the number of the image blocks of each level in the any image area according to the level of each image block indicated by the second label of each image block in the any image area; and the acquisition equipment determines the grade corresponding to the most image blocks as the grade corresponding to any image area. For example, if the image area includes 1 first-level image block, 2 second-level image blocks, and 3 third-level image blocks, the capture device may determine the level of the image area as the third level.

In a possible implementation manner, the acquisition device may further include a target image area composed of a plurality of adjacent image blocks corresponding to the same level according to the second label of each image block in the original image, where the level corresponding to the target image area is the level corresponding to the image block in the target image area.

In a possible implementation manner, if the acquiring device does not perform the dilation operation on the ROI image block, when any image region of the plurality of image regions corresponds to the third level, the acquiring device performs the dilation operation on the any image region to determine a dilated image region of the any image region, where the dilated image region corresponds to the fourth level.

The process of performing the expansion operation on any image region by the acquisition device may be: the acquisition equipment performs expansion operation on any image area to obtain a first expanded image area of any expanded image area; the acquisition equipment performs expansion operation on the first expansion image area to obtain a second expansion image area of any expanded image area; the acquisition device determines the second dilated image area as a dilated image area of the any image area. The first expansion operation can enable the first expansion image area to comprise pixel points which are arranged on the periphery of any image area and used for forming a key part of the target object, and the third image area which is formed by the first expansion image area and any image area can correspond to a third grade, so that the third image area can comprise all the pixel points which form the key part of the target object, the precision of the image area corresponding to the third grade is guaranteed, and the coverage of the key part of the target object can be improved.

The second dilated image area corresponds to a fourth level, which may be the highest level, and the second dilated image area may be subsequently encoded at the highest level, so that the quality of the encoded image of the second dilated image area may be ensured. Since the second expanded image region is located around the image region of the third level, the higher the encoding quality around any image region is, the higher the encoding quality of any image region is during encoding, so that the encoding quality of the image region of the third level, that is, the encoding quality of the key portion of the target object can be improved.

When any image area corresponds to the second grade, the acquisition device can also perform expansion operation on any image area to obtain a fourth image area, wherein the fourth image area comprises the any image area and the expanded image area of the any image area, and the fourth image area corresponds to the second grade, so that the precision of the image area corresponding to the second grade can be improved.

706. The acquisition equipment encodes the original image to obtain an encoded image with the ROI code rate higher than the non-ROI code rate, wherein in the process of encoding the original image, the high-frequency alternating current component in a first amplitude suppression range in the non-ROI in the original image is suppressed, the high-frequency alternating current component in a second amplitude suppression range in any image region belonging to the ROI is suppressed, and the level corresponding to the any image region corresponds to the second amplitude suppression range.

The ROI code rate is the code rate of ROI coding in the original image, and the non-ROI code rate is the code rate of non-ROI coding in the original image. The acquisition device suppresses the high-frequency alternating-current component in the non-ROI in the original image within the first amplitude suppression range, namely when encoding the alternating-current component of the original image, for the non-ROI, the acquisition device encodes only the high-frequency alternating-current component outside the first amplitude suppression range in the non-ROI, but does not encode the high-frequency alternating-current component in the first amplitude suppression range in the non-ROI, so as to achieve the purpose of suppressing the high-frequency alternating-current component in the first amplitude suppression range.

The acquisition device suppresses the high-frequency alternating-current component belonging to the second amplitude suppression range in any image region of the ROI, that is, when encoding the alternating-current component of the original image, for any image region of the ROI, the acquisition device encodes only the high-frequency alternating-current component in any image region outside the second amplitude suppression range, and does not encode the high-frequency alternating-current component in the second amplitude suppression range in any image region, for the purpose of suppressing the high-frequency alternating-current component of the second amplitude suppression range.

Because the amplitude suppression range of the high-frequency alternating-current component corresponding to each level in the multiple levels is higher than the first amplitude suppression range, the suppression degree of the acquisition device on the high-frequency alternating-current component in the ROI image area is lower than the suppression degree on the high-frequency alternating-current component in the non-ROI image area in the process of encoding the original image, so that the acquisition device can obtain an encoded image with the ROI code rate higher than the non-ROI code rate.

707. The acquisition device sends the encoded image to a storage device.

708. The storage device receives the encoded image and stores the encoded image.

Upon receiving the encoded image, the storage device may store the encoded image locally or store the encoded image in a database. The embodiment of the present application does not describe the manner in which the storage device stores the encoded image.

The method provided by the embodiment of the application can reduce the data amount of the non-ROI encoded image in the original image by suppressing the high-frequency alternating-current component in the non-ROI in the original image, and the data amount of the non-ROI encoded image is reduced because the encoded image of the original image is composed of the non-ROI encoded image and the ROI encoded image, so that more encoded images of the original image can be stored when the storage device stores the encoded images of the original image, and the cost of the storage device can be reduced. And because the ROI code rate is higher than that of the coded image with the non-ROI code rate, the coding quality of the coded image with the ROI can be guaranteed, and the data volume of the coded image of the original image can be reduced. Moreover, through the expansion operation, it can be avoided that the image blocks of the third level identified by the second model cannot completely cover the key part of the whole target object, and the image blocks of the second level cannot cover the whole target object, and the encoding quality of the image blocks of the third level can be improved.

In a possible implementation, during the encoding of the original image, the acquisition device may also suppress only non-ROIs in the original image, and not ROIs in the original image. For example, fig. 8 is a flowchart of an image encoding method according to an embodiment of the present application.

801. The acquisition device acquires an original image.

802. The acquisition equipment preprocesses the original image to obtain target image data of the original image.

The process shown in step 802 is introduced in step 701, and details of step 802 are not described herein in this embodiment of the present application.

803. The acquisition equipment performs block processing on the original image based on the target image data to obtain a plurality of image areas.

The process shown in step 803 is introduced in step 702, and details of step 803 are not described in this embodiment of the present application.

804. The acquisition device determines whether each image region in the original image is a ROI.

The process shown in step 804 is also the process shown in steps 703 and 704, and here, the description of step 804 is not repeated in this embodiment of the present application.

805. The acquisition equipment encodes the original image to obtain an encoded image with the ROI code rate higher than the non-ROI code rate, wherein in the process of encoding the original image, the high-frequency alternating current component in the non-ROI of the original image within the first amplitude suppression range is suppressed.

The acquisition device only suppresses the high-frequency alternating-current component in the original image within the first amplitude suppression range in the non-ROI, but does not suppress the high-frequency alternating-current component in the ROI in the original image, so as to achieve the purpose of reducing the data volume of the encoded image of the original image.

The method provided by the embodiment of the application can reduce the data amount of the non-ROI encoded image in the original image by suppressing the high-frequency alternating-current component in the non-ROI in the original image, and the data amount of the non-ROI encoded image is reduced because the encoded image of the original image is composed of the non-ROI encoded image and the ROI encoded image, so that more encoded images of the original image can be stored when the storage device stores the encoded images of the original image, and the cost of the storage device can be reduced.

The encoding performed by the capturing device in fig. 7 and 8 may be JPEG encoding, and for further explaining the process of suppressing the ac component in the non-ROI in the original image by the capturing device during JPEG encoding of the original image, refer to a flowchart of JPEG encoding provided in the embodiment of the present application shown in fig. 9. In fig. 9, the acquisition device may perform color mode conversion on the original data of the original image to obtain target image data in RGB format; the acquisition equipment carries out blocking processing on the original image based on the target image data to obtain a plurality of N-by-N image areas, wherein N is an integer greater than 0; the acquisition equipment performs Discrete Cosine Transform (DCT) on each N x N image area to obtain a plurality of N x N transformation coefficient matrixes, each transformation coefficient matrix corresponds to one image area, and each transformation coefficient matrix comprises an Alternating Current (AC) coefficient of one image area and a Direct Current (DC) coefficient of the image area; the acquisition equipment quantizes each transformation coefficient matrix to obtain a transformation coefficient matrix after each image area is quantized; the acquisition equipment carries out zigzag (zigzag) scanning sequencing on the quantized transform coefficient matrix of each image area to obtain a sequenced transform coefficient matrix of each image area; the acquisition equipment carries out differential pulse modulation coding on the DC coefficient in each sequenced transformation coefficient matrix to obtain DC coded data of each image area; the acquisition device calculates an intermediate format of the DC coefficient of each image region based on the DC encoded data of each image region; in the process of obtaining the intermediate format of the DC coefficient of each image region, the acquisition device may determine whether each image block of the original image is an ROI based on the first model, and determine whether any image region is an ROI according to whether each image block is an ROI; if any image area is a non-ROI, the acquisition equipment inhibits high-frequency components in a first amplitude inhibition range of the AC coefficient in any image area, and performs run length coding on the AC coefficient in any image area after inhibition; for any image area, if the image area is an ROI, the acquisition equipment performs run length coding on all AC coefficients in a transformation coefficient matrix after the image area is sequenced to obtain AC coded data of the image area; after the acquisition equipment acquires the AC encoded data of each image area, the acquisition equipment calculates the intermediate format of the AC coefficient of each image area based on the AC encoded data of each image area; after the intermediate format of the DC coefficient of each image area and the intermediate format of the AC coefficient of each image area are obtained, entropy coding is carried out on the intermediate format of the DC coefficient of each image area and the intermediate format of the AC coefficient of each image area by the acquisition equipment, and a coded image of the original image is obtained.

The acquisition device can also encode the original image according to the levels of the image area and the quantization step corresponding to each level to reduce the data volume of the encoded image of the original image, the storage device encodes the image, and when the processing device needs the encoded image in the JPEG format and the format of the stored encoded image is not the JPEG format, the storage device can also convert the stored encoded image into the encoded image in the JPEG format and send the converted encoded image in the JPEG format to the processing device. To further illustrate this process, refer to a flowchart of another image encoding method provided in the embodiment of the present application shown in fig. 10.

1001. The acquisition device acquires an original image, which is obtained by image acquisition, the original image including a plurality of image areas.

The process shown in step 1001 is the same as the process shown in step 701, and here, the embodiment of the present application does not repeat this step 1001.

1002. The acquisition equipment carries out blocking processing on the original image to obtain a plurality of image areas.

The process shown in step 1002 is the same as the process shown in step 702, and here, the description of step 1002 is not repeated in this embodiment of the present application.

1003. The acquisition equipment inputs the original image into a second model, and the second model outputs second labels of all image blocks in the original image based on the original image, wherein the second label of one image block is used for indicating the corresponding grade of the image block.

The process shown in step 1003 is the same as the process shown in step a, and here, this step 1003 is not described in detail in this embodiment of the present application.

It should be noted that, if an image block corresponds to the second level, the acquisition device may further perform an expansion operation on the image block to obtain a first image area, where the first image area corresponds to the second level, and the process is the same as the process of performing the expansion operation on the image block in step a.

If an image block corresponds to the third level, the image block may be an image area of a target object, the acquisition device may further perform expansion operation on the image block to obtain a second image area, the acquisition device may perform expansion processing on the second image area obtained by expanding the image block to obtain an expanded image area of the second image area, and the expanded image area of the second image area may correspond to the fourth level.

1004. For any image area in the plurality of image areas, the acquisition equipment determines the grade corresponding to the any image area according to the second label of the image block in the any image area.

The process that the acquisition device determines the level corresponding to any image area according to the second label of the image block in any image area is described in step 705, and here, this step 1004 is not described in detail in this embodiment of the present application.

It should be noted that, if the acquiring device does not perform the dilation operation on the ROI image block, when any image region of the plurality of image regions corresponds to the third level, the acquiring device may further perform the dilation operation on the any image region to determine a dilated image region of the any image region, where the dilated image region corresponds to the fourth level. The process of the expansion operation is also described in step 705, and the process of the expansion operation is not described in detail in this embodiment of the present application.

It should be noted that the process shown in step 1003-1004 is also a process of determining, by the acquisition device, the level corresponding to each image area of the original image.

1005. The acquisition equipment acquires the quantization step corresponding to the grade of each image area.

The different levels may correspond to different quantization step sizes in the encoding process, the quantization step size corresponding to the first level is larger than the quantization step size corresponding to the second level, the quantization step size corresponding to the second level is larger than the quantization step size corresponding to the third level, and the quantization step size corresponding to the third level is larger than the quantization step size corresponding to the fourth level. The smaller the quantization step corresponding to an image region is, the higher the coding quality of the image region is, and the higher the code rate is when the image region is coded.

For any image area in the original image, the acquisition equipment determines the quantization step corresponding to the grade corresponding to the image area according to the basic quantization step of the original image, the distortion strength corresponding to at least one grade and the quality enhancement coefficient corresponding to at least one grade. The at least one level may be a first level, a second level, a third level, and a fourth level, respectively, and the acquisition device may determine the quantization step corresponding to each level by the following equations (1) to (4).

Wherein, ROI₁、ROI₂、ROI₃And ROI₄Image areas corresponding to a first level, a second level, a third level and a fourth level respectively;

and

the quantization step lengths respectively correspond to a first level, a second level, a third level and a fourth level; q_baseQuantization step size based on original image α₁、α₂、α₃And α₄Respectively corresponding quality enhancement coefficients of a first grade, a second grade, a third grade and a fourth grade; d_ROI1、D_ROI2、D_ROI3And D_ROI4The distortion intensity is respectively corresponding to the first level, the second level, the third level and the fourth level.

The derivation process of the above equations (1) - (4) may be: assuming that the size of the original image is B and the desired compression ratio is the target compression ratio r, the size B of the encoded image obtained by encoding the original image by the acquisition device_rAs shown in equation (5).

B_r＝B*r (5)

If it is to the originalThe same encoding process is applied to each image area in the image, and the size of the encoded image of the original image is B_baseAs shown in equation (6). Wherein the content of the first and second substances,

and

the total size of the image area corresponding to the first level, the second level, the third level and the fourth level, respectively.

Since the size of the encoded image is related to the predicted distortion intensity D, there is a relationship shown in formula (7) between the quality enhancement coefficient α of the image region of each level and the distortion intensity D, where the distortion intensity D is used to measure the residual energy after encoding quantization, for an image region of any level, the acquisition apparatus may predict a plurality of predicted distortion intensities of the image region of the level according to an image motion estimation algorithm, and use the Mean Squared Error (MSE) or the Sum of Absolute Errors (SAE) between the plurality of predicted distortion intensities as the distortion intensity of the image region of the level, and the larger α is, the better the encoding quality of the image region is.

It is known that the distortion intensity D is proportional to the square of the quantization step Q, that is, the relationship between the distortion intensity D and the quantization step Q is as shown in equation (8).

D∝Q²(8)

As can be seen from equations (7) to (8), there is also a relationship shown in equation (9) between the quality enhancement coefficient and the distortion intensity of the image area of each level.

And since the relationship between the size of the original image, the distortion intensity and the quantization step size is shown in equation (10), equations (1) - (4) can be derived from equations (5), (6) and (10).

It should be noted that, in addition to obtaining the quantization step size corresponding to each level through the above equations (1) - (4), the acquisition device may also obtain the quantization step size corresponding to each level based on user configuration. In a possible implementation manner, a user may input the quantization step corresponding to each level on a configuration interface of the acquisition device, and when the acquisition device detects that the user completes configuration of the quantization step corresponding to each level, the acquisition device is triggered to receive the quantization step corresponding to each level. It should be noted that the user may configure the quantization step corresponding to each level once, and does not need to configure the quantization step multiple times.

1006. And the acquisition equipment encodes the original image to obtain an encoded image with the ROI code rate higher than the non-ROI code rate, wherein in the process of encoding the original image, the acquisition equipment encodes the original image according to the grade corresponding to each image region and the quantization step corresponding to the grade corresponding to each image region.

The encoding method performed in step 1006 is a target encoding method, where a compression rate of the target encoding method is lower than that of JPEG encoding, such as HEIF encoding or WebP encoding, and the target encoding method is not specifically limited in this embodiment of the present application.

After the levels of the image regions and the quantization step sizes corresponding to the levels of the image regions are obtained, the acquisition device may generate a quantization parameter map (QPmap) of the original image based on the quantization step sizes corresponding to the levels of the image regions, where the quantization parameter map is used to store the quantization step sizes corresponding to the image regions of the original image, where the quantization step size corresponding to each image region is also the quantization step size corresponding to the level of each image region. The acquisition equipment can send the original image and the quantization parameter map of the original image to an encoder in the acquisition equipment through a data interface, and the encoder performs encoding of different encoding levels on each image area in the original image based on the quantization parameter map of the original image and a target encoding mode. The process is also a process of coding each image area of the original image in a target coding mode by the acquisition device according to the quantization parameter map of the original image.

Because the quantization step lengths corresponding to the second level, the third level and the fourth level are all smaller than the quantization step length corresponding to the first level, and the image areas corresponding to the second level, the third level and the fourth level are all ROI, the image area corresponding to the first level is a non-ROI, the larger the quantization step length is, the smaller the quantization step length is, the larger the code rate is, and the acquisition equipment encodes each image area of the original image according to the quantization step length corresponding to the level corresponding to each image area, thereby obtaining an encoded image with the ROI code rate higher than the non-ROI code rate.

1007. The acquisition device sends the encoded image to a storage device.

1008. The storage device receives the encoded image and stores the encoded image.

The process shown in step 1008 is the same as the process shown in step 708, and here, the description of step 1008 is not repeated in this embodiment of the present application.

According to the method provided by the implementation of the application, the original image is encoded according to the levels corresponding to the image areas and the quantization step lengths corresponding to the levels corresponding to the image areas, so that the compression rate of the original image during encoding can be reduced, the data volume of the encoded image is less, the storage device can store more encoded images, and the cost of the storage device can be reduced. If the HEIF encoding is performed, the compression rate achieved by HEIF encoding the original image is lower than the compression rate achieved by JPEG encoding the original image, and therefore, the data amount of the encoded image obtained by HEIF encoding the original image is relatively small, so that the storage device can store a large number of encoded images in the HEIF format, and the cost of the storage device can be reduced. Moreover, because different levels correspond to different quantization coefficients, the original image is subjected to the HEIF coding based on the levels corresponding to the image regions, so that the compression rate of the HEIF coding can be further reduced, the data size of the coded image in the HEIF format is further reduced, and the cost of the storage device is further reduced. And because the ROI code rate is higher than that of the coded image with the non-ROI code rate, the coding quality of the coded image with the ROI can be guaranteed, and the data volume of the coded image of the original image can be reduced. Moreover, through the expansion operation, it can be avoided that the image blocks of the third level identified by the second model cannot completely cover the key part of the whole target object, and the image blocks of the second level cannot cover the whole target object, and the encoding quality of the image blocks of the third level can be improved.

For further explanation of the process, refer to a flowchart of a transcoding method provided in this embodiment shown in fig. 11.

1101. The acquisition device acquires an original image.

1102. The acquisition equipment preprocesses the original image to obtain target image data of the original image.

The process shown in step 1102 is introduced in step 701, and this embodiment of the present application does not describe step 1102 again.

1103. The acquisition equipment performs block processing on the original image based on the target image data to obtain a plurality of image areas.

The process shown in step 1103 is introduced in step 702, and this step 1003 is not described in this embodiment.

1104. The acquisition device determines the corresponding grade of each image area of the original image.

The process shown in step 1104 is similar to the process shown in steps 1003 and 1004, and here, the description of step 1104 is not repeated in this embodiment of the present application.

1105. The acquisition equipment acquires the quantization step corresponding to the grade of each image area.

The process shown in step 1105 is the same as the process shown in step 1005, and here, this embodiment of the present application does not repeat this step 1105.

1106. And the acquisition equipment generates a quantization parameter map of the original image according to the grade of each image area and the quantization step corresponding to the grade of each image area.

The process shown in step 1106 is described in step 1006, and details of step 1106 are not described in this embodiment of the present application.

1107. And the acquisition equipment encodes each image area of the original image according to the quantization parameter map of the original image to obtain an encoded image with the ROI code rate higher than the non-ROI code rate.

The process shown in step 1107 is described in step 1006, and this step 1107 is not described in detail in this embodiment of the present application. It should be noted that the process shown in step 1106-1107 is that the acquiring device performs encoding of the original image in the target encoding manner according to the level corresponding to each image area and the quantization step corresponding to the level corresponding to each image area.

1108. The acquisition device sends the encoded image to a storage device.

1109. The storage device receives the coded image sent by the acquisition device and stores the coded image.

The process shown in step 1109 is the same as the process shown in step 708, and here, the description of this step 1109 is not repeated in this embodiment of the present application. It should be noted that the process of receiving, by the storage device, the encoded image sent by the acquisition device is also the process of acquiring, by the storage device, the encoded image.

1110. In response to the image viewing request, the storage device decodes the encoded image to obtain a decoded image.

The image viewing request may include an image format identifier, where the image format identifier is used to indicate a format of an encoded image required by the processing device, and if the image format identifier is a JPEG identifier, it indicates that the processing device needs the encoded image in the JPEG format, and if the image format identifier is a target format identifier, it indicates that the processing device needs the encoded image in the target encoding mode.

Taking the image format identifier as a JPEG identifier as an example, when the storage device receives an image viewing request sent by a processing device, if the JPEG identifier is carried in the image viewing request, it indicates that the processing device needs a coded image in the JPEG format, and if the image format of the coded image is in the HEIF format, the storage device may perform HEIF decoding on the coded image, so as to perform JPEG encoding on the obtained decoded image.

It should be noted that, when the target format identifier carried in the image viewing request sent by the processing device is the HEIF identifier, it indicates that the processing device needs the encoded image in the HEIF format, and if the format of the encoded image is the HEIF format, the storage device may directly send the encoded image to the processing device.

1111. And the storage equipment carries out joint photographic experts group JPEG coding on the decoded image to obtain a target coded image.

The encoding process shown in step 1111 may also be the encoding process shown in step 706, the encoding process shown in step 805, or an existing JPEG encoding process, and the process of step 1111 is not specifically limited in this embodiment of the present application.

1112. The storage device outputs the target encoded image.

The storage device may output the target encoded image to a processing device to meet the needs of the processing device.

According to the method provided by the implementation of the application, the original image is encoded according to the levels corresponding to the image areas and the quantization step lengths corresponding to the levels corresponding to the image areas, so that the compression rate of the original image during encoding can be reduced, the data volume of the encoded image is less, the storage device can store more encoded images, and the cost of the storage device can be reduced. If the HEIF encoding is performed, the compression rate achieved by HEIF encoding the original image is lower than the compression rate achieved by JPEG encoding the original image, and therefore, the data amount of the encoded image obtained by HEIF encoding the original image is relatively small, so that the storage device can store a large number of encoded images in the HEIF format, and the cost of the storage device can be reduced. Moreover, because different levels correspond to different quantization coefficients, the original image is subjected to the HEIF coding based on the levels corresponding to the image regions, so that the compression rate of the HEIF coding can be further reduced, the data size of the coded image in the HEIF format is further reduced, and the cost of the storage device is further reduced. And because the ROI code rate is higher than that of the coded image with the non-ROI code rate, the coding quality of the coded image with the ROI can be guaranteed, and the data volume of the coded image of the original image can be reduced. Moreover, through the expansion operation, it can be avoided that the image blocks of the third level identified by the second model cannot completely cover the key part of the whole target object, and the image blocks of the second level cannot cover the whole target object, and the encoding quality of the image blocks of the third level can be improved. And the storage device can send the coded image in the coding format required by the processing device to the processing device according to the image viewing request, so that the coded image in the storage device can be prevented from being unidentified by the processing device.

In a possible implementation manner, the encoded image sent by the acquisition device may not reach the target compression ratio, or the encoded image originally stored by the storage device does not reach the target compression ratio, where the target compression ratio may be a compression ratio that can be reached by the target encoding method, and then the storage device may also transcode the encoded image that does not reach the target compression ratio to obtain the encoded image that reaches the target compression ratio, for further explaining the process, refer to a flowchart of another image transcoding method provided in this embodiment of the present application, which is shown in fig. 12.

1201. The storage device decodes the encoded image to obtain an original image, which includes a plurality of image regions.

The encoded picture may be any encoded picture that does not reach the target compression rate. If the image format of the encoded image is JPEG format, the storage device may perform JPEG decoding on the encoded image to obtain the original image.

1202. The storage device preprocesses the original image to obtain target image data of the original image.

The process shown in step 1202 is the same as the process shown in step 802, and here, the description of step 1202 is not repeated in this embodiment of the present application.

1203. The storage device performs block processing on the original image based on target image data to obtain a plurality of image areas.

The process shown in step 1203 is the same as the process shown in step 803, and here, this step 1203 is not described in detail in this embodiment of the present application.

1204. The storage device determines a rating corresponding to each image area of the original image.

The same process as that of the acquiring device determining the level corresponding to each image area of the original image in step 1204 is performed in step 1104, and here, this embodiment of the present application does not specifically limit this step 1204.

1205. The storage device obtains the quantization step corresponding to the level of each image area.

The process shown in step 1205 is the same as the process in which the acquiring device acquires the quantization step corresponding to the level of each image area in step 1005, and here, this embodiment of the present application does not specifically limit this step 1205.

1206. And the storage equipment encodes the original image to obtain an encoded image with the ROI code rate higher than the non-ROI code rate, wherein in the process of encoding the original image, the storage equipment encodes the original image according to the grade corresponding to each image region and the quantization step corresponding to the grade corresponding to each image region.

The process in step 1206 is the same as the process in step 1006, and here, the embodiment of the present application does not repeat this step 1206.

1207. In response to the image viewing request, the storage device decodes the encoded image to obtain a decoded image.

The process shown in step 1207 and the storage device in step 1110 decodes the encoded image to obtain a decoded image. The process of step 1207 is not described herein again in this embodiment.

1208. And the storage equipment carries out JPEG coding on the decoded image to obtain a target coded image.

The encoding process shown in step 1208 may also be the encoding process shown in step 706, the encoding process shown in step 805, or an existing JPEG encoding process, and the process in step 1208 is not specifically limited in this embodiment of the present application.

1209. The storage device outputs the target encoded image.

The process shown in step 1209 is similar to the process shown in step 1112, and this step 1209 is not described in detail in this embodiment of the present application.

Fig. 13 is a schematic structural diagram of a chip 1300 according to an embodiment of the present disclosure, where the chip 1300 includes a processor 1301 and a memory 1302, and the memory 1302 stores at least one instruction, and the instruction is loaded and executed by the processor 1301 to implement the operations performed by the methods according to the embodiments. In a possible implementation manner, the processor 1301 may be a neural-Network Processing Unit (NPU), and the chip 1300 may further include an encoder 1303, a decoder 1304, and an image processing unit 1305, where the encoder 1303 is configured to encode an image, may be a Video Encoder (VENC), the decoder 1304 is configured to decode an encoded image, may be a Video Decoder (VDEC), and the image processing unit 1305 is configured to perform operations such as preprocessing on the image, and may be a video surveillance security system (VGS) or a video processing sub-system (VPSS). Chip 1300 may be a camera System On Chip (SOC) and may further include an advanced reduced instruction set computer (ARM) core and a Digital Signal Processor (DSP).

In a possible implementation manner, the memory 1302 may not be located inside the chip 1300, but located outside the chip 1300, for example, as shown in fig. 14, the PCI expansion card 1400 may include a plurality of chips 1300, each chip 1300 is connected to one memory 1302, the chips 1300 may communicate with each other through a PCI-e switch (peripheral component interconnect express), the VCN in the chip 1300 communicates with the PCIe bus in the PCI expansion card 1400, the PCI expansion card 1400 may be installed on a collecting device or a storage device, so that the collecting device or the storage device performs the operations performed by the method provided in the above embodiments on the PCI expansion card 1400, for example, step 1110 in fig. 11 may be performed in VDEC, step 1111 and step 1111 may be performed in VENC, step 1106 may be performed in an ARM. For another example, step 1201 in fig. 12 may be completed in the VDEC, step 1203 may be completed in the VGS, step 1204 may be completed in the NPU, step 1206 and step 1208 may be completed in the VENC, and step 1207 may be completed in the VDEC. It should be noted that the PCIe communication mode in the chipset 1400 may be replaced by other communication modes, such as User Datagram Protocol (UDP) or universal asynchronous receiver/transmitter (UART).

Fig. 15 is a schematic structural diagram of an image encoding apparatus according to an embodiment of the present application, where the apparatus 1500 includes:

an obtaining module 1501, configured to obtain an original image, where the original image is obtained through image acquisition;

a determining module 1502 for determining regions of interest, ROI, and non-ROI in the original image;

the encoding module 1503 is configured to encode the original image to obtain an encoded image with a ROI code rate higher than a non-ROI code rate, where in the process of encoding the original image, the high-frequency alternating-current component in the original image within a first amplitude suppression range in the non-ROI is suppressed.

the encoding module 1503 is further configured to:

and the amplitude suppression range of the high-frequency alternating-current component corresponding to the third level is higher than that of the high-frequency alternating-current component corresponding to the second level.

In one possible implementation, the apparatus further includes:

and the expansion module is used for performing expansion operation on any image area in the original image when the image area corresponds to the third grade so as to determine an expanded image area of the image area, wherein the expanded image area corresponds to the fourth grade.

In one possible implementation, the expansion module is configured to:

performing expansion operation on any image area to obtain a first expanded image area of the expanded any image area;

performing expansion operation on the first expanded image area to obtain a second expanded image area of any expanded image area;

determining the second dilated image area as a dilated image area of the any image area.

In a possible implementation manner, the determining module 1502 is further configured to:

the determination module 1502 includes:

the first input sub-module is used for inputting the original image into a second model, and the second model outputs a second label of each image block in the original image based on the original image, wherein the second label of one image block is used for indicating the corresponding grade of the image block;

and the first determining submodule is used for determining the corresponding grade of any image area in the plurality of image areas according to the second label of the image block in the image area.

In one possible implementation, the first determining sub-module is configured to:

when any image area comprises a plurality of image blocks, determining the number of the image blocks of each grade in any image area;

and determining the grade corresponding to the most image blocks as the grade corresponding to any image area.

In one possible implementation, the apparatus further includes:

the input module is used for inputting a plurality of historical images into an image semantic segmentation model, and outputting a plurality of first pixel labels by the image semantic segmentation model based on the historical images, wherein one first pixel label is used for indicating one first pixel point, and an image area of a target object consists of a plurality of first pixel points;

the input module is further configured to input the plurality of historical images into an image recognition model, and output a plurality of second pixel labels by the image recognition model based on the plurality of historical images, wherein one second pixel label is used for indicating one second pixel point, and an image area of a key portion of the target object is composed of the plurality of second pixel points;

and the training module is used for training the initial model based on the first pixel points indicated by the first pixel labels of the historical images and the second pixel points indicated by the second pixel labels of the historical images to obtain the second model.

In one possible implementation, the training module includes:

the generation submodule is used for generating a training set based on a first pixel point indicated by a first pixel label of each historical image and a second pixel point indicated by a second pixel label of each historical image, and the training set comprises each historical image and training labels of all image blocks of each historical image;

and the training submodule is used for inputting the training set into the initial model, and the initial model is trained on the basis of the training set to obtain the second model.

In one possible implementation, the generating sub-module includes:

the block dividing unit is used for carrying out block dividing processing on any historical image in the plurality of historical images to obtain a plurality of historical image blocks;

the determining unit is used for determining whether any historical image block in the plurality of historical image blocks is an ROI according to the number of second pixel points, the number of third pixel points and the number of fourth pixel points in the any historical image block, wherein the third pixel points are first pixel points except the second pixel points in the historical image, the third pixel points are used for forming a non-key part of the target object, and the fourth pixel points are used for forming a non-ROI;

the determining unit is further configured to determine a level corresponding to any historical image block according to the number of second pixel points, the number of third pixel points, and the number of fourth pixel points in any historical image block;

the determining unit is further configured to determine a training label of any historical image block according to whether the any historical image block is an ROI and a level corresponding to the any historical image block;

and the combining unit is used for combining the plurality of historical images and the training labels of the image blocks of each historical image into the training set.

In a possible implementation manner, the determining unit is configured to:

the determination module 1502 includes:

the second input sub-module is used for inputting the original image into a first model, outputting a first label of each image block in the original image based on the original image by the first model, wherein the first label of one image block is used for indicating whether the image block is an ROI image block;

and the second determining sub-module is used for determining whether any image area in the plurality of image areas of the original image is the ROI according to the first label of the image block in the any image area.

In one possible implementation, the second determining submodule is configured to:

In one possible implementation, the determining module 1502 is configured to:

determining an image area formed by a plurality of adjacent ROI image blocks as an ROI according to the first label of each image block in the original image;

and determining an image area formed by a plurality of adjacent non-ROI sub-blocks as a non-ROI according to the first label of each sub-block in the original image.

The encoding device can reduce the data volume of the non-ROI encoded image in the original image by inhibiting the high-frequency alternating current component in the first amplitude inhibition range in the non-ROI in the original image, the data volume of the non-ROI encoded image is reduced because the encoded image of the original image consists of the non-ROI encoded image and the ROI encoded image, the data volume of the encoded image of the original image can be reduced, when the storage device stores the encoded image of the original image, more encoded images of the original image can be stored, the cost of the storage device can be reduced, and the ROI code rate is higher than that of the encoded image of the non-code rate, so that the encoding quality of the ROI encoded image can be ensured, and the data volume of the encoded image of the original image can be reduced.

Fig. 16 is a schematic structural diagram of an image encoding apparatus according to an embodiment of the present application, where the apparatus 1600 includes:

an obtaining module 1601, configured to obtain an original image, where the original image is obtained through image acquisition, and the original image includes a plurality of image areas;

a determining module 1602, configured to determine a level corresponding to each image region of the original image;

an encoding module 1603, configured to encode the original image to obtain an encoded image with a ROI code rate higher than a non-ROI code rate, where in the process of encoding the original image, the original image is encoded according to a level corresponding to each image region and a quantization step corresponding to the level corresponding to each image region.

In one possible implementation, the apparatus further includes:

an expansion module 1604, configured to perform an expansion operation on any image area of the plurality of image areas when the image area corresponds to the third level, so as to determine an expanded image area of the image area, where the expanded image area corresponds to the fourth level.

In one possible implementation, the expansion module 1604 is configured to:

In one possible implementation, the determining module 1602 includes:

the input sub-module is used for inputting the original image into a second model, and the second model outputs a second label of each image block in the original image based on the original image, wherein the second label of one image block is used for indicating the corresponding grade of the image block;

and the determining submodule is used for determining the corresponding grade of any image area in the plurality of image areas according to the second label of the image block in the image area.

In one possible implementation, the determining sub-module is configured to:

In one possible implementation, the apparatus further includes:

In one possible implementation, the training module includes:

and the obtaining submodule is used for inputting the training set into the initial model, and the initial model is trained on the basis of the training set to obtain the second model.

In one possible implementation, the generating sub-module includes:

In a possible implementation manner, the determining unit is configured to:

In a possible implementation manner, the determining module 1602 is further configured to:

The encoding device encodes the original image according to the levels corresponding to the image areas and the quantization step lengths corresponding to the levels corresponding to the image areas, so that the compression rate of the original image during encoding can be reduced, the data volume of the encoded image is small, the storage device can store more encoded images, and the cost of the storage device can be reduced. If the HEIF encoding is performed, the compression rate achieved by HEIF encoding the original image is lower than the compression rate achieved by JPEG encoding the original image, and therefore, the data amount of the encoded image obtained by HEIF encoding the original image is relatively small, so that the storage device can store a large number of encoded images in the HEIF format, and the cost of the storage device can be reduced. Moreover, because different levels correspond to different quantization coefficients, the original image is subjected to the HEIF coding based on the levels corresponding to the image regions, so that the compression rate of the HEIF coding can be further reduced, the data size of the coded image in the HEIF format is further reduced, and the cost of the storage device is further reduced. And because the ROI code rate is higher than that of the coded image with the non-ROI code rate, the coding quality of the coded image with the ROI can be guaranteed, and the data volume of the coded image of the original image can be reduced.

Fig. 17 is a schematic structural diagram of an image encoding apparatus according to an embodiment of the present application, where the apparatus 1700 includes:

a decoding module 1701, configured to decode the encoded image to obtain an original image, where the original image includes a plurality of image regions;

a determining module 1702, configured to determine a level corresponding to each image region of the original image;

an encoding module 1703, configured to encode the original image to obtain an encoded image with a ROI code rate higher than a non-ROI code rate, where in the process of encoding the original image, the original image is encoded according to a level corresponding to each image region and a quantization step corresponding to the level corresponding to each image region;

the decoding module 1701 is further configured to decode the encoded image in response to an image viewing request to obtain a decoded image;

the encoding module 1703 is further configured to perform joint photographic experts group JPEG encoding on the decoded image to obtain a target encoded image;

an output module 1704, configured to output the target encoded image.

In one possible implementation, the apparatus further includes:

and the expansion module is used for performing expansion operation on any image area in the plurality of image areas when the image area corresponds to the third grade so as to determine an expanded image area of the image area, wherein the expanded image area corresponds to the fourth grade.

In one possible implementation, the expansion module is configured to:

In one possible implementation, each image area comprises at least one image block; the determining module 1702 includes:

In one possible implementation, the determining sub-module is configured to:

In one possible implementation, the apparatus further includes:

In one possible implementation, the training module includes:

In one possible implementation, the generating sub-module includes:

In a possible implementation manner, the determining unit is configured to:

In one possible implementation, the determining module 1702 is further configured to:

The device can reduce the compression rate of the original image during encoding by encoding the original image according to the grade corresponding to each image area and the quantization step corresponding to the grade corresponding to each image area, so that the data volume of the encoded image is less, the storage device can store more encoded images, and the cost of the storage device can be reduced. If the HEIF encoding is performed, the compression rate achieved by HEIF encoding the original image is lower than the compression rate achieved by JPEG encoding the original image, and therefore, the data amount of the encoded image obtained by HEIF encoding the original image is relatively small, so that the storage device can store a large number of encoded images in the HEIF format, and the cost of the storage device can be reduced. Moreover, because different levels correspond to different quantization coefficients, the original image is subjected to the HEIF coding based on the levels corresponding to the image regions, so that the compression rate of the HEIF coding can be further reduced, the data size of the coded image in the HEIF format is further reduced, and the cost of the storage device is further reduced. And because the ROI code rate is higher than that of the coded image with the non-ROI code rate, the coding quality of the coded image with the ROI can be guaranteed, and the data volume of the coded image of the original image can be reduced.

All the above optional technical solutions may be combined arbitrarily to form the optional embodiments of the present disclosure, and are not described herein again.

It should be noted that: in the image encoding device or the image forwarding device provided in the above embodiments, when an image is encoded or transcoded, only the division of the above functional modules is taken as an example, and in practical applications, the function distribution may be completed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the embodiments of the image encoding method or the image transcoding method provided by the above embodiments belong to the same concept, and specific implementation processes thereof are described in the embodiments of the methods for details, and are not described herein again.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The above description is only exemplary of the present application and should not be taken as limiting, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. An image encoding method, characterized in that the method comprises:

acquiring an original image, wherein the original image is acquired through image acquisition;

determining a region of interest ROI and a non-ROI in the original image;

and coding the original image to obtain a coded image with the ROI code rate higher than the non-ROI code rate, wherein in the process of coding the original image, the high-frequency alternating current component in the non-ROI of the original image in the first amplitude suppression range is suppressed.

2. The method according to claim 1, wherein the original image includes a plurality of image regions, the image regions belonging to the non-ROI each corresponding to a first level, the image regions belonging to the ROI having a plurality of levels, wherein each of the image regions belonging to the ROI respectively corresponds to one of the plurality of levels, the different levels corresponding to different amplitude suppression ranges of the high-frequency ac component, and the amplitude suppression range of the high-frequency ac component corresponding to each of the plurality of levels is higher than the first amplitude suppression range;

the method further comprises the following steps:

3. The method of claim 2, wherein the image areas of non-critical portions of the target object correspond to a second level and the image areas of critical portions of the target object correspond to a third level;

4. A method according to claim 3, characterized in that the image areas expanded on the basis of the image areas corresponding to the third level correspond to a fourth level, the amplitude suppression range of the high-frequency alternating-current components corresponding to the fourth level being higher than the amplitude suppression range of the high-frequency alternating-current components corresponding to the third level.

5. The method of claim 1, wherein the original image comprises a plurality of image areas, each image area comprising at least one image block;

inputting the original image into a first model, outputting a first label of each image block in the original image by the first model based on the original image, wherein the first label of one image block is used for indicating whether the image block is an ROI image block;

and for any image area in the plurality of image areas, determining whether the any image area is the ROI according to the first label of the image block in the any image area.

6. The method according to claim 5, wherein the determining whether the any image region is the ROI according to the first label of the image block in the any image region comprises:

and when the total size of the non-ROI image blocks in any image area is larger than that of the ROI image blocks in any image area, determining any image area as a non-ROI, otherwise, determining any image area as an ROI.

7. The method according to claim 5, wherein the determining whether the any image region is the ROI according to the first label of the image block in the any image region comprises:

8. An image encoding method, characterized in that the method comprises:

acquiring an original image, wherein the original image is acquired through image acquisition and comprises a plurality of image areas;

determining the grade corresponding to each image area of the original image;

and coding the original image to obtain a coded image with the ROI code rate higher than the non-ROI code rate, wherein in the process of coding the original image, the original image is coded according to the grade corresponding to each image area and the quantization step corresponding to the grade corresponding to each image area.

9. The method of claim 8, wherein image areas other than an image area of a target object correspond to a first level, image areas of non-critical portions of the target object correspond to a second level, and image areas of critical portions of the target object correspond to a third level;

10. The method of claim 9, wherein the image region expanded based on the image region corresponding to the third level corresponds to a fourth level, and wherein the quantization step corresponding to the third level is larger than the quantization step corresponding to the fourth level.

11. The method of claim 10, further comprising:

12. An image transcoding method, the method comprising:

decoding the coded image to obtain an original image, wherein the original image comprises a plurality of image areas;

determining the grade corresponding to each image area of the original image;

encoding the original image to obtain an encoded image with the ROI code rate higher than the non-ROI code rate, wherein in the process of encoding the original image, the original image is encoded according to the grade corresponding to each image area and the quantization step corresponding to the grade corresponding to each image area;

responding to an image viewing request, and decoding the coded image to obtain a decoded image;

carrying out joint photographic experts group JPEG coding on the decoded image to obtain a target coded image;

and outputting the target coding image.

13. The method of claim 12, wherein each image area comprises at least one image block; the determining the corresponding grade of each image area of the original image comprises:

inputting the original image into a second model, and outputting a second label of each image block in the original image by the second model based on the original image, wherein the second label of one image block is used for indicating the corresponding grade of the image block;

and for any image area in the plurality of image areas, determining the grade corresponding to the image area according to the second label of the image block in the image area.

14. An image encoding apparatus, characterized in that the apparatus comprises:

the acquisition module is used for acquiring an original image, and the original image is acquired through image acquisition;

a determining module, configured to determine a region of interest ROI and a non-ROI in the original image;

and the coding module is used for coding the original image to obtain a coded image with the ROI code rate higher than the non-ROI code rate, wherein in the process of coding the original image, the high-frequency alternating current component in the non-ROI of the original image within a first amplitude suppression range is suppressed.

15. An image encoding apparatus, characterized in that the apparatus comprises:

the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring an original image, the original image is acquired by image acquisition, and the original image comprises a plurality of image areas;

the determining module is used for determining the corresponding grade of each image area of the original image;

and the coding module is used for coding the original image to obtain a coded image with the ROI code rate higher than the non-ROI code rate, wherein the original image is coded according to the grade corresponding to each image region and the quantization step corresponding to the grade corresponding to each image region in the process of coding the original image.

16. An image transcoding apparatus, the apparatus comprising:

the decoding module is used for decoding the stored coded image to obtain an original image;

the encoding module is used for encoding the original image to obtain an encoded image with the ROI code rate higher than the non-ROI code rate, wherein the original image is encoded according to the grade corresponding to each image region and the quantization step corresponding to the grade corresponding to each image region in the process of encoding the original image;

the decoding module is further used for responding to an image viewing request, and decoding the coded image to obtain a decoded image;

the coding module is also used for carrying out Joint Photographic Experts Group (JPEG) coding on the decoded image to obtain a target coded image;

and the output module is used for outputting the target coding image.

17. A chip comprising a processor and a memory, the memory having stored therein at least one instruction that is loaded and executed by the processor to perform an operation performed by the method of any one of claims 1 to 13.

18. An acquisition device comprising a processor and a memory, the memory having stored therein at least one instruction that is loaded and executed by the processor to perform operations performed by the image encoding method of any one of claims 1 to 11.

19. A storage device comprising a processor and a memory, the memory having stored therein at least one instruction that is loaded and executed by the processor to perform operations performed by the method of image transcoding of any of claims 12 to 13.

20. A computer-readable storage medium having stored therein at least one instruction which is loaded and executed by a processor to perform operations performed by a method according to any one of claims 1 to 13.