CN117440104A - Data compression reconstruction method based on target significance characteristics - Google Patents
Data compression reconstruction method based on target significance characteristics Download PDFInfo
- Publication number
- CN117440104A CN117440104A CN202311767134.1A CN202311767134A CN117440104A CN 117440104 A CN117440104 A CN 117440104A CN 202311767134 A CN202311767134 A CN 202311767134A CN 117440104 A CN117440104 A CN 117440104A
- Authority
- CN
- China
- Prior art keywords
- target
- grid
- data compression
- image
- results
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000013144 data compression Methods 0.000 title claims abstract description 95
- 238000000034 method Methods 0.000 title claims abstract description 71
- 238000001514 detection method Methods 0.000 claims abstract description 45
- 238000007906 compression Methods 0.000 claims abstract description 41
- 230000006835 compression Effects 0.000 claims abstract description 41
- 238000007781 pre-processing Methods 0.000 claims abstract description 17
- 238000012545 processing Methods 0.000 claims description 24
- 238000012549 training Methods 0.000 claims description 15
- 230000008569 process Effects 0.000 claims description 14
- 238000006243 chemical reaction Methods 0.000 claims description 10
- 238000001914 filtration Methods 0.000 claims description 8
- 238000009499 grossing Methods 0.000 claims description 5
- 230000005484 gravity Effects 0.000 claims description 4
- 238000005516 engineering process Methods 0.000 abstract description 3
- 238000004321 preservation Methods 0.000 abstract description 2
- 238000004422 calculation algorithm Methods 0.000 description 8
- 238000004590 computer program Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 238000010420 art technique Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000003709 image segmentation Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000011022 operating instruction Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N1/00—Scanning, transmission or reproduction of documents or the like, e.g. facsimile transmission; Details thereof
- H04N1/46—Colour picture communication systems
- H04N1/64—Systems for the transmission or the storage of the colour picture signal; Details therefor, e.g. coding or decoding means therefor
- H04N1/648—Transmitting or storing the primary (additive or subtractive) colour signals; Compression thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/46—Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
- G06V10/462—Salient features, e.g. scale invariant feature transforms [SIFT]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/42—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/85—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
Abstract
The specification discloses a data compression reconstruction method based on target significance characteristics, which relates to the technical field of data compression reconstruction and comprises the steps of dividing an original image into a plurality of batches and preprocessing; performing target detection on the preprocessed image by using a Mask R-CNN model to obtain a model detection result; grouping the model detection results to obtain a data set of the required target and other targets; splitting grids of the preprocessed original image, and storing and compressing the grids in groups to obtain other target compression results, background compression results and required target compression results; reconstructing grid images of other targets and backgrounds by adopting a bilinear interpolation method, and reconstructing grid images of a required target by adopting a VAE model to obtain an interpolation result and a reconstruction sample; and splicing the interpolation result and the reconstruction sample to obtain a reconstruction image, so as to solve the problems of redundant information preservation and low accuracy of data reconstruction in the existing data compression reconstruction technology.
Description
Technical Field
The invention belongs to the technical field of data compression reconstruction, and particularly relates to a data compression reconstruction method based on target significance characteristics.
Background
With the continuous development of big data application, the data volume of various sensors is continuously increased, the increasingly huge data volume is continuously challenging the limit of storage resources, and it is urgent to establish an intelligent algorithm capable of realizing data compression to effectively reduce the storage space. There are some existing works, which generally use computer vision techniques such as object detection or saliency detection, image segmentation, etc., and by dividing the original data into different areas, information of the areas containing the saliency features is preferentially retained, so that the data volume is reduced while the main content is maintained.
However, these prior art techniques have some problems when dealing with complex scenes or images or videos having multiple salient objects. For example, due to misinterpretations of the object detection model, they may save some non-critical information as well, resulting in information redundancy. In addition, in the prior art, the category information provided by the target detection model may be effectively utilized to obtain the association relationship between the targets, so that interference data is fused into the data generation model in the reconstruction process related to the targets, and the accuracy of data reconstruction is reduced.
Therefore, the existing data compression reconstruction technology has the problems of redundant information preservation and low accuracy of data reconstruction when processing images or videos of complex scenes or a plurality of salient objects.
Disclosure of Invention
The invention aims to provide a data compression reconstruction method based on target significance characteristics, which aims to solve the problems of redundant stored information and low accuracy of data reconstruction when the current data compression reconstruction technology processes images or videos of complex scenes or a plurality of significance objects.
In order to achieve the above purpose, the invention adopts the following technical scheme:
in one aspect, the present disclosure provides a data compression reconstruction method based on a target significance signature, including:
dividing an original image into a plurality of batches, and preprocessing the original image of a target batch after batch;
performing target detection on the preprocessed original image by using a Mask R-CNN model to obtain a model detection result;
grouping the model detection results according to the target class labels to obtain a required target data set and other target data sets;
splitting grids of the preprocessed original image, and storing the split grids in groups and performing preliminary data compression according to the attribution relation between the grids and the required target data set and the attribution relation between the grids and the other target data sets to obtain other target data compression storage results, background compression storage results and required target data compression storage results;
reconstructing grid images corresponding to the other target data compression storage results and the background compression storage results by adopting a bilinear interpolation method, and reconstructing the grid images corresponding to the required target data compression storage results by adopting a trained VAE model to obtain interpolation results and reconstructed samples;
and splicing the interpolation result and the reconstructed sample to obtain a reconstructed complete image.
In another aspect, the present specification provides a data compression reconstruction device based on a target significance signature, comprising:
the preprocessing module is used for dividing an original image into a plurality of batches and preprocessing the original image of a target batch after batch;
the target detection module is used for carrying out target detection on the preprocessed original image by using a Mask R-CNN model to obtain a model detection result;
the target grouping module is used for grouping the model detection results according to target class labels to obtain a required target data set and other target data sets;
the image compression module is used for splitting grids of the preprocessed original image, and carrying out grouping storage and preliminary data compression on the split grids according to the attribution relation between the grids and the required target data set and the attribution relation between the grids and the other target data sets to obtain other target data compression storage results, background compression storage results and required target data compression storage results;
the image reconstruction module is used for reconstructing grid images corresponding to the other target data compression storage results and the background compression storage results by adopting a bilinear interpolation method, reconstructing grid images corresponding to the required target data compression storage results by adopting a trained VAE model, and obtaining interpolation results and reconstruction samples;
and the image splicing module is used for splicing the interpolation result and the reconstructed sample to obtain a reconstructed complete image.
Based on the technical scheme, the following technical effects can be obtained in the specification:
the method can be used for identifying the salient features in the complex scene more accurately by combining the deep learning algorithm Mask R-CNN and the VAE model, and processing complex correlation among a plurality of images with similar salient features.
Drawings
Fig. 1 is a flow chart of a data compression reconstruction method based on a target salient feature according to an embodiment of the invention.
FIG. 2 is a schematic diagram of grid splitting in an embodiment of the invention.
Figure 3 is a schematic diagram of a variable self-encoder VAE model in an embodiment of the present invention.
Fig. 4 is a schematic structural diagram of a data compression reconstruction device based on a target salient feature according to an embodiment of the present invention.
Fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the invention.
Detailed Description
The advantages and features of the present invention will become more fully apparent from the following description and appended claims, taken in conjunction with the accompanying drawings and detailed description. It should be noted that the drawings are in a very simplified form and are adapted to non-precise proportions, merely for the purpose of facilitating and clearly aiding in the description of embodiments of the invention.
It should be noted that, in order to clearly illustrate the present invention, various embodiments of the present invention are specifically illustrated by the present embodiments to further illustrate different implementations of the present invention, where the various embodiments are listed and not exhaustive. Furthermore, for simplicity of explanation, what has been mentioned in the previous embodiment is often omitted in the latter embodiment, and therefore, what has not been mentioned in the latter embodiment can be referred to the previous embodiment accordingly.
Example 1
Referring to fig. 1, fig. 1 shows a data compression reconstruction method based on a target significance signature according to the present embodiment. In this embodiment, the method includes:
step 102, dividing an original image into a plurality of batches, and preprocessing the original image of a target batch after batch;
in this embodiment, one implementation manner of step 102 is:
step 202, dividing an original image into a plurality of batches, and carrying out image size adjustment on the original image of a target batch to obtain an image after size adjustment;
specifically, the input image is expressed asWherein->Image set representing a Batch (Batch), a program for executing the method, and a program for executing the method>Indicating that the time stamp is +.>Image of->Reference numeral indicating time stamp->Representing the number of images in a batch. The preprocessing process of the original image sequentially completes the size adjustment of the image, the conversion of the color space and the denoising.
The image size adjustment unifies the sizes of the pictures in the same batch, so that the input of the follow-up algorithm has consistency. Recording each image separatelyIs +.>And->Finishing to obtain length and width set of the batch of images +.>And +.>Calculating to obtain maximum value ∈>And->Unifying the storage sizes of all images to +.>The extended parts are all completed by zero-filling, and the image completed by the above processing is marked as +.>。
Step 204, performing graying treatment on the image after the size adjustment to obtain an image after the image color space conversion;
specifically, the conversion of the image color space carries out the gray processing on the image, so that the problem that the occupied space is large because each pixel point in the image needs to be stored in a ternary array when the general image is stored in an RGB format is solved. At present, various graying methods exist, all can be used, and the invention mainly adopts an average method for calculating portability. Suppose that an image is to be in RGB formatLateral->Longitudinal->The array that needs to be stored for each pixel is represented asThe average method has the following calculation formula:
wherein,the gradation value recorded after gradation is expressed. Further, the logarithmic gray scale transformation is utilized to map the low gray scale value with narrow range in the original image to the gray scale interval with wide range, and simultaneously the range is widerThe high gray value interval maps to a narrower gray interval. The formula for gray scale conversion is as follows:
the gradation value after gradation conversion is expressed. For->All images in (a) are subjected to the above-mentioned processing, and the result after the processing is recorded as +.>。
And 206, performing smoothing processing on noise in the image after the image color space conversion to obtain a preprocessed original image.
Specifically, the image denoising performs smoothing processing on noise in the above-described image. The invention uses a general image Gaussian filter to remove noise in the image, and the denoised result is expressed as。
Representing the original image result after all the image preprocessing processes are completed asAs input to a subsequent feature detection algorithm.
104, performing target detection on the preprocessed original image by using a Mask R-CNN model to obtain a model detection result;
in this embodiment, the model detection result includes: the target category label, the outline where the target is located, the center of gravity of the target, the target number and the total amount of the target.
Specifically, the original images after preprocessing in step 102 are assembledAs an input, the related information of the object contained in the image is acquired based on the object detection model, and the detection result is recorded. The target detection model can be completed by using various models, including one of known models such as YOLO, mask R-CNN, SSD and the like. Since the characteristic that multiple targets are often involved in a single image is considered, the Mask R-CNN (Mask Region-based Convolutional Neural Network) model is mainly used for target detection in the second step. The detected record information comprises a target category Label (Label), a border line (centering Box) where a target is located, a target Center of gravity (Center) and the like in each graph generated by the Mask R-CNN model.
Since a single picture may typically contain multiple objects, the information therein is stored using an array. For preprocessed image setsEvery image +.>,/>The Mask R-CNN model is abbreviated as +.>The detection result of the model is recorded as follows: />
Wherein the method comprises the steps ofRepresenting the +.>Category label of individual object->Representing the +.>The outer frame line where the individual targets are located,/->Representing the +.>Center of gravity of individual target, < >>Number indicating object,/->Indicating that Mask R-CNN is detecting +.>The total amount of targets detected when the image is formed. The model detection result of all images in the image set can be recorded as:>
step 106, grouping the model detection results according to the target class labels to obtain a required target data set and other target data sets;
specifically, when target detection is performed, the detection result of the model usually has a plurality of targets, but not all targets belong to targets required for analysis, in addition, when the detection is started, the model is generated by taking an image as a main unit, and when data compression is performed, the detected required targets are mainly used, and certain difference exists between the two targets, so that the subsequent algorithm processing is facilitated, and the grouping arrangement of data is performed. Representing the target class label required in analysis asFor the followingAll of (A)>Data for tag classThe results form the desired target data set, expressed as:
accordingly, not all that is done is toOther target data sets are composed for the data results of the tag class, expressed as: />
The original detection result set is divided into two parts, namely。
Step 108, splitting grids of the preprocessed original image, and carrying out grouping storage and preliminary data compression on the split grids according to the attribution relation between the grids and the required target data set and the attribution relation between the grids and the other target data sets to obtain other target data compression storage results, background compression storage results and required target data compression storage results;
in this embodiment, one implementation manner of step 108 is:
step 302, carrying out grid splitting on the preprocessed original image to obtain a plurality of grid images;
step 304, storing the grid images in groups according to whether the grid images belong to the required target data set or not to obtain a required target grid set, other target grid sets and a background grid set;
in this embodiment, the required target grid set is all grid images in which the outer frame line where the required target is located and the range thereof; the other target grid sets are all grid images in which the outer frame lines of other targets are located and the range of the outer frame lines; the background grid set is all other grid images remaining.
Specifically, referring to fig. 2, for each preprocessed imageAll split using the same grid. On the basis of the gridding of the images, all grid images in the outer frame line where the required target is located and the range of the outer frame line are collectively called a target grid set and expressed as +.>. All grid images in the outer frame line where other targets are located and the range of the outer frame line are collectively called as other target grid sets, and are expressed as +.>. All other grid images are collectively referred to as the background grid set, denoted +.>. In particular, when a grid appears to belong to both the target grid set and the other target grid set, the grid is partitioned into the target grid set. Thereby, a complete division of the image is achieved, i.e. there is +.>。
Based on the above, the present embodiment extracts the region related to the target in the original image based on the target detection model (such as Yolo, mask R-CNN, etc.), and extracts the target grid set, other target grid set, and background grid set respectively based on the extracted target frame line information and the grid division requirement of the image, so as to provide the function of data preprocessing for the subsequent compression algorithm.
Step 306, performing preliminary data compression on grid images in the other target grid sets and the background grid set by using a Gaussian filtering method to obtain other target data compression storage results and background compression storage results;
in this embodiment, one implementation manner of step 306 is:
can be used onceThe size Gaussian convolution kernel processes the grid data in the other target grid set to obtain other target numbersStoring the result according to compression;
is used twiceAnd processing the grid data in the background grid set by a Gaussian convolution kernel of the size to obtain a background compression storage result.
Specifically, the Gaussian filtering method is utilized to collect other target gridsAnd background set->Downsampling the grid image of (1). The gaussian kernel convolution operation (gaussian filtering) uses a gaussian convolution to check the image for weighted averaging. For background grid set->The number of convolution kernels used for the images in (a) should be larger than +.>A convolution kernel used for the image of (a). One simple implementation is to use +.>Size Gaussian convolution kernel processing +.>Is used twiceSize Gaussian convolution kernel processing +.>. The expression of the convolution kernel is as follows: />
After the convolution kernel processing is used, all even rows and columns are deleted again, and a reduced image is obtained.
Target grid setThe original resolution is preserved. Other target grid set->Is used once +.>After the Gaussian convolution kernel processing, the resolution is reduced to the original +.>Expressed as->The method comprises the steps of carrying out a first treatment on the surface of the Background grid set->The images of (a) are used twice +.>After the Gaussian convolution kernel processing, the resolution is reduced to the original +.>Expressed as->。
When it is needed, the size and the number of times of use of the convolution kernel can be adjusted according to actual needs, for example, when the data compression ratio needs to be improvedOr->And larger gaussian convolution kernels.
Based on this, the present embodiment processes data compression in other target grid sets and background grid sets by using a gaussian filtering method with relatively low calculation amount, and in consideration of limited information amount related to the target provided in the background grid set, the data in the background grid set is repeatedly used by using the gaussian filtering method, so as to further reduce the occupation amount of the data.
And 308, inputting the required target grid set into a VAE model for preliminary data compression, and obtaining a required target data compression storage result.
In this embodiment, before step 308, the method further includes:
taking the grid image of the required target grid set as a training sample;
and training the VAE model based on the training sample and the loss function to obtain a trained VAE model.
In this embodiment, one implementation manner of step 308 is:
and inputting the grid image of the required target grid set into the trained VAE model, and performing data compression by using an encoder therein to obtain a required target data compression storage result.
Specifically, referring to FIG. 3, the data of the target mesh set is taken as the input dataset of the variational self-encoder VAE (Variational Autoencoders) model, denoted as. The VAE model assumes that the input data is composed of +.>Individual variables->The model combines two modules of encoder and decoder. The encoder compresses the input data into unobserved random features, and the decoder effects mapping of the compressed data from the feature space back into the data space prior to data compression. The unobserved target significance signature is denoted +.>。
The data generation process of the VAE model mainly comprises two processes. First from a priori distributionOne of the samplesThen according to the condition distribution->Use->Generate->. The VAE model wants to find a parameter +.>Maximizing the probability of generating real data: />
Wherein the method comprises the steps ofParameters representing the distribution, wherein->Can use the significance signature +.>Is integrated by (a)
More specifically, the results of the generation of the VAE model will be such that the posterior distributionPosterior distribution as much as possible to its reality>And keep the same. Based on a given training sample->The training loss is as follows:
wherein the method comprises the steps ofThe KL divergence representing the prior and posterior distribution has the following calculation formula:
after VAE model training, the encoder is used for storing target grid image dataData compression is performed to represent the compressed image data as +.>。
Thereby, the original imageThe compressed stored results of (a) can be expressed as: />。
Step 110, reconstructing grid images corresponding to the other target data compression storage results and the background compression storage results by adopting a bilinear interpolation method, and reconstructing grid images corresponding to the required target data compression storage results by adopting a trained VAE model to obtain interpolation results and reconstructed samples;
in this embodiment, one implementation manner of step 110 is:
step 402, performing interpolation processing on the other target data compression storage results and the corresponding grid images in the background compression storage results by adopting a bilinear interpolation library in OpenCV to obtain an interpolation result; the interpolation result comprises other target data reconstruction results and background reconstruction results;
in particular, other target grid sets and contextsThe grids in the grid set are handled in a similar manner. More specifically, a single grid in a grid set is denoted asIs a two-dimensional matrix in the form of a gray scale map. Defining the coordinates of elements in the image as +.>Wherein->Right from left to right, < >>Positive from top to bottom. Recording interpolation results as +.f by bilinear interpolation library processing in OpenCV>And->。
And step 404, reconstructing a grid image corresponding to the required target data compression storage result by adopting a decoder in the trained VAE model, and obtaining a reconstructed sample similar to the grid image of the required target grid set.
Specifically, images in the target grid set are then generated using the VAE model. The VAE model is generated by the following modelWherein->Is an encoder->Is a standard normal distribution. In generating the sample, first from ∈ ->Random sampling of +.>After passing the decoder, the training data is obtained>Similar sample->。
Based on this, the present embodiment hierarchically handles the reconstruction process of the mesh data; for other target grid sets and background grid sets with less information, a data reconstruction method based on bilinear interpolation is used; and for the data in the target grid needing to retain more abundant information, a corresponding variable self-encoder (VAE) model is established, so that the compression and reconstruction of the data are realized.
And step 112, splicing the interpolation result and the reconstructed sample to obtain a reconstructed complete image.
Specifically, the result after image reconstruction is expressed asAfter integrating the reconstruction results of all images in a batch, the reconstruction results of a batch may be expressed as:>
in this embodiment, after step 112, the method further includes:
the image reconstruction of one batch is completed, the image data of the next batch is input, and then the processes from the step 102 to the step 112 are repeated to perform the image reconstruction of the next stage.
In summary, the method can more accurately identify the salient features in the complex scene by combining the deep learning algorithm Mask R-CNN and the VAE model, and can process complex correlation among a plurality of images with similar salient features.
Example 2
Referring to fig. 4, fig. 4 shows that the present embodiment provides a data compression reconstruction device based on a target significance signature, which includes:
the preprocessing module is used for dividing an original image into a plurality of batches and preprocessing the original image of a target batch after batch;
the target detection module is used for carrying out target detection on the preprocessed original image by using a Mask R-CNN model to obtain a model detection result;
the target grouping module is used for grouping the model detection results according to target class labels to obtain a required target data set and other target data sets;
the image compression module is used for splitting grids of the preprocessed original image, and carrying out grouping storage and preliminary data compression on the split grids according to the attribution relation between the grids and the required target data set and the attribution relation between the grids and the other target data sets to obtain other target data compression storage results, background compression storage results and required target data compression storage results;
the image reconstruction module is used for reconstructing grid images corresponding to the other target data compression storage results and the background compression storage results by adopting a bilinear interpolation method, reconstructing grid images corresponding to the required target data compression storage results by adopting a trained VAE model, and obtaining interpolation results and reconstruction samples;
and the image splicing module is used for splicing the interpolation result and the reconstructed sample to obtain a reconstructed complete image.
Optionally, the preprocessing module includes:
the size adjustment unit is used for dividing the original image into a plurality of batches, and performing image size adjustment on the original image of the target batch to obtain an image after size adjustment;
the color adjustment unit is used for carrying out gray-scale treatment on the image after the size adjustment to obtain an image after the image color space conversion;
and the denoising smoothing unit is used for smoothing noise in the image after the image color space conversion to obtain a preprocessed original image.
Optionally, the image compression module includes:
the grid splitting unit is used for splitting grids of the preprocessed original image to obtain a plurality of grid images;
the grid image grouping unit is used for grouping and storing the grid images according to whether the grid images belong to the required target data set or not to obtain a required target grid set, other target grid sets and a background grid set;
the other target and background compression unit is used for carrying out preliminary data compression on grid images in the other target grid set and the background grid set by utilizing a Gaussian filtering method to obtain other target data compression storage results and background compression storage results;
and the target image compression unit is used for inputting the required target grid set into the VAE model for preliminary data compression, and obtaining a required target data compression storage result.
Optionally, the other target and background compression unit includes:
other target compression subunits for single useThe size Gaussian convolution kernel processes grid data in the other target grid sets to obtain other target data compression storage results;
background image compression subunit for use twiceAnd processing the grid data in the background grid set by a Gaussian convolution kernel of the size to obtain a background compression storage result.
Optionally, the method further comprises:
the training sample acquisition module is used for taking the grid image of the required target grid set as a training sample;
and the model training module is used for training the VAE model based on the training sample and the loss function to obtain a trained VAE model.
Optionally, the image reconstruction module includes:
the interpolation reconstruction unit is used for carrying out interpolation processing on the other target data compression storage results and the grid images corresponding to the background compression storage results by adopting a bilinear interpolation library in OpenCV to obtain interpolation results; the interpolation result comprises other target data reconstruction results and background reconstruction results;
and the VAE model reconstruction unit is used for reconstructing a grid image corresponding to the required target data compression storage result by adopting a decoder in the trained VAE model to obtain a reconstructed sample similar to the grid image of the required target grid set.
Based on the method, the device can more accurately identify the salient features in the complex scene by combining the deep learning algorithm Mask R-CNN and the VAE model, and can process complex correlation among a plurality of images with similar salient features.
Example 3
Referring to fig. 5, the present embodiment provides an electronic device, which includes a processor, an internal bus, a network interface, a memory, and a nonvolatile memory, and may include hardware required by other services. The processor reads the corresponding computer program from the nonvolatile memory to the memory and then runs the computer program to form a data compression reconstruction method based on the target significance characteristics on a logic level. Of course, other implementations, such as logic devices or combinations of hardware and software, are not excluded from the present description, that is, the execution subject of the following processing flows is not limited to each logic unit, but may be hardware or logic devices.
The network interface, processor and memory may be interconnected by a bus system. The buses may be classified into address buses, data buses, control buses, and the like.
The memory is used for storing programs. In particular, the program may include program code including computer-operating instructions. The memory may include read only memory and random access memory and provide instructions and data to the processor.
The processor is used for executing the program stored in the memory and specifically executing:
step 102, dividing an original image into a plurality of batches, and preprocessing the original image of a target batch after batch;
104, performing target detection on the preprocessed original image by using a Mask R-CNN model to obtain a model detection result;
step 106, grouping the model detection results according to the target class labels to obtain a required target data set and other target data sets;
step 108, splitting grids of the preprocessed original image, and carrying out grouping storage and preliminary data compression on the split grids according to the attribution relation between the grids and the required target data set and the attribution relation between the grids and the other target data sets to obtain other target data compression storage results, background compression storage results and required target data compression storage results;
step 110, reconstructing grid images corresponding to the other target data compression storage results and the background compression storage results by adopting a bilinear interpolation method, and reconstructing grid images corresponding to the required target data compression storage results by adopting a trained VAE model to obtain interpolation results and reconstructed samples;
and step 112, splicing the interpolation result and the reconstructed sample to obtain a reconstructed complete image.
The processor may be an integrated circuit chip having signal processing capabilities. In implementation, each step of the above method may be implemented by an integrated logic circuit of hardware of a processor or an instruction in a software form.
Based on the same invention, the embodiments of the present specification also provide a computer-readable storage medium storing one or more programs that, when executed by an electronic device including a plurality of application programs, cause the electronic device to perform a data compression reconstruction method based on the target significance signature provided by the corresponding embodiments of fig. 1 to 3.
It will be appreciated by those skilled in the art that embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the present specification may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present description can take the form of a computer program product on one or more computer-readable storage media having computer-usable program code embodied therein.
In addition, for the device embodiments described above, since they are substantially similar to the method embodiments, the description is relatively simple, and references to the parts of the description of the method embodiments are only required. Moreover, it should be noted that in the respective modules of the system of the present application, the components thereof are logically divided according to functions to be implemented, but the present application is not limited thereto, and the respective components may be re-divided or combined as necessary.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments.
The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the particular order shown, or the sequential order shown, is not necessarily required to achieve desirable results in the course of drawing figures, and in some embodiments, multitasking and parallel processing may be possible or advantageous.
The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and changes may be made to the present application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc. which are within the spirit and principles of the present application are intended to be included within the scope of the claims of the present application.
Claims (10)
1. The data compression reconstruction method based on the target significance characteristics is characterized by comprising the following steps of:
dividing an original image into a plurality of batches, and preprocessing the original image of a target batch after batch;
performing target detection on the preprocessed original image by using a Mask R-CNN model to obtain a model detection result;
grouping the model detection results according to the target class labels to obtain a required target data set and other target data sets;
splitting grids of the preprocessed original image, and storing the split grids in groups and performing preliminary data compression according to the attribution relation between the grids and the required target data set and the attribution relation between the grids and the other target data sets to obtain other target data compression storage results, background compression storage results and required target data compression storage results;
reconstructing grid images corresponding to the other target data compression storage results and the background compression storage results by adopting a bilinear interpolation method, and reconstructing the grid images corresponding to the required target data compression storage results by adopting a trained VAE model to obtain interpolation results and reconstructed samples;
and splicing the interpolation result and the reconstructed sample to obtain a reconstructed complete image.
2. The method of claim 1, wherein the steps of dividing the original image into batches and preprocessing the batched original image of the target batch include:
dividing an original image into a plurality of batches, and carrying out image size adjustment on the original image of a target batch to obtain an image after size adjustment;
carrying out graying treatment on the image after the size adjustment to obtain an image after the image color space conversion;
and carrying out smoothing treatment on noise in the image after the image color space conversion to obtain a preprocessed original image.
3. The method of claim 2, wherein the model detection result comprises: the target category label, the outline where the target is located, the center of gravity of the target, the target number and the total amount of the target.
4. A method according to claim 3, wherein the steps of splitting the grid of the preprocessed original image, and performing packet storage and preliminary data compression on the split grid according to the attribution relation with the required target data set and the other target data set, to obtain other target data compression storage results, background compression storage results and required target data compression storage results comprise:
grid splitting is carried out on the preprocessed original image, and a plurality of grid images are obtained;
the grid images are stored in groups according to whether the grid images belong to the required target data set or not, and a required target grid set, other target grid sets and a background grid set are obtained;
performing preliminary data compression on grid images in the other target grid sets and the background grid sets by using a Gaussian filtering method to obtain other target data compression storage results and background compression storage results;
and inputting the required target grid set into a VAE model for preliminary data compression to obtain a required target data compression storage result.
5. The method of claim 4, wherein the desired target grid set is all grid images within and around the outline of the desired target; the other target grid sets are all grid images in which the outer frame lines of other targets are located and the range of the outer frame lines; the background grid set is all other grid images remaining.
6. The method of claim 4, wherein the step of obtaining the other target data compression storage result and the background compression storage result by performing preliminary data compression on the grid images in the other target grid set and the background grid set by using a gaussian filtering method comprises:
can be used onceThe size Gaussian convolution kernel processes grid data in the other target grid sets to obtain other target data compression storage results;
is used twiceAnd processing the grid data in the background grid set by a Gaussian convolution kernel of the size to obtain a background compression storage result.
7. The method of claim 4, further comprising, prior to said inputting the desired set of target grids into a VAE model for preliminary data compression, obtaining a desired target data compression storage result:
taking the grid image of the required target grid set as a training sample;
and training the VAE model based on the training sample and the loss function to obtain a trained VAE model.
8. The method of claim 7 wherein the inputting the desired set of target grids into the VAE model for preliminary data compression obtains a desired target data compression storage result by inputting grid images of the desired set of target grids into the trained VAE model for data compression using an encoder therein.
9. The method of claim 8, wherein the steps of reconstructing the grid image corresponding to the other target data compression storage result and the background compression storage result using the bilinear interpolation method, and reconstructing the grid image corresponding to the required target data compression storage result using the trained VAE model, and obtaining the interpolation result and the reconstructed sample comprise:
performing interpolation processing on the other target data compression storage results and the grid images corresponding to the background compression storage results by adopting a bilinear interpolation library in OpenCV to obtain interpolation results; the interpolation result comprises other target data reconstruction results and background reconstruction results;
and reconstructing a grid image corresponding to the required target data compression storage result by adopting a decoder in the trained VAE model to obtain a reconstructed sample similar to the grid image of the required target grid set.
10. A data compression reconstruction device based on a target significance signature, comprising:
the preprocessing module is used for dividing an original image into a plurality of batches and preprocessing the original image of a target batch after batch;
the target detection module is used for carrying out target detection on the preprocessed original image by using a Mask R-CNN model to obtain a model detection result;
the target grouping module is used for grouping the model detection results according to target class labels to obtain a required target data set and other target data sets;
the image compression module is used for splitting grids of the preprocessed original image, and carrying out grouping storage and preliminary data compression on the split grids according to the attribution relation between the grids and the required target data set and the attribution relation between the grids and the other target data sets to obtain other target data compression storage results, background compression storage results and required target data compression storage results;
the image reconstruction module is used for reconstructing grid images corresponding to the other target data compression storage results and the background compression storage results by adopting a bilinear interpolation method, reconstructing grid images corresponding to the required target data compression storage results by adopting a trained VAE model, and obtaining interpolation results and reconstruction samples;
and the image splicing module is used for splicing the interpolation result and the reconstructed sample to obtain a reconstructed complete image.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311767134.1A CN117440104B (en) | 2023-12-21 | 2023-12-21 | Data compression reconstruction method based on target significance characteristics |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311767134.1A CN117440104B (en) | 2023-12-21 | 2023-12-21 | Data compression reconstruction method based on target significance characteristics |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117440104A true CN117440104A (en) | 2024-01-23 |
CN117440104B CN117440104B (en) | 2024-03-29 |
Family
ID=89555744
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311767134.1A Active CN117440104B (en) | 2023-12-21 | 2023-12-21 | Data compression reconstruction method based on target significance characteristics |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117440104B (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110428366A (en) * | 2019-07-26 | 2019-11-08 | Oppo广东移动通信有限公司 | Image processing method and device, electronic equipment, computer readable storage medium |
CN113971763A (en) * | 2020-12-21 | 2022-01-25 | 河南铮睿科达信息技术有限公司 | Small target segmentation method and device based on target detection and super-resolution reconstruction |
CN114155153A (en) * | 2021-12-14 | 2022-03-08 | 安徽创世科技股份有限公司 | High-resolution image reconstruction method and device |
US20220351043A1 (en) * | 2021-04-30 | 2022-11-03 | Chongqing University | Adaptive high-precision compression method and system based on convolutional neural network model |
WO2023123924A1 (en) * | 2021-12-30 | 2023-07-06 | 深圳云天励飞技术股份有限公司 | Target recognition method and apparatus, and electronic device and storage medium |
CN116485652A (en) * | 2023-04-26 | 2023-07-25 | 北京卫星信息工程研究所 | Super-resolution reconstruction method for remote sensing image vehicle target detection |
CN116740261A (en) * | 2022-03-02 | 2023-09-12 | 腾讯科技(深圳)有限公司 | Image reconstruction method and device and training method and device of image reconstruction model |
-
2023
- 2023-12-21 CN CN202311767134.1A patent/CN117440104B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110428366A (en) * | 2019-07-26 | 2019-11-08 | Oppo广东移动通信有限公司 | Image processing method and device, electronic equipment, computer readable storage medium |
CN113971763A (en) * | 2020-12-21 | 2022-01-25 | 河南铮睿科达信息技术有限公司 | Small target segmentation method and device based on target detection and super-resolution reconstruction |
US20220351043A1 (en) * | 2021-04-30 | 2022-11-03 | Chongqing University | Adaptive high-precision compression method and system based on convolutional neural network model |
CN114155153A (en) * | 2021-12-14 | 2022-03-08 | 安徽创世科技股份有限公司 | High-resolution image reconstruction method and device |
WO2023123924A1 (en) * | 2021-12-30 | 2023-07-06 | 深圳云天励飞技术股份有限公司 | Target recognition method and apparatus, and electronic device and storage medium |
CN116740261A (en) * | 2022-03-02 | 2023-09-12 | 腾讯科技(深圳)有限公司 | Image reconstruction method and device and training method and device of image reconstruction model |
CN116485652A (en) * | 2023-04-26 | 2023-07-25 | 北京卫星信息工程研究所 | Super-resolution reconstruction method for remote sensing image vehicle target detection |
Also Published As
Publication number | Publication date |
---|---|
CN117440104B (en) | 2024-03-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111382867B (en) | Neural network compression method, data processing method and related devices | |
CN109086811B (en) | Multi-label image classification method and device and electronic equipment | |
CN110648334A (en) | Multi-feature cyclic convolution saliency target detection method based on attention mechanism | |
CN109784372B (en) | Target classification method based on convolutional neural network | |
CN112329702B (en) | Method and device for rapid face density prediction and face detection, electronic equipment and storage medium | |
CN111738344A (en) | Rapid target detection method based on multi-scale fusion | |
CN112101386B (en) | Text detection method, device, computer equipment and storage medium | |
CN114444565A (en) | Image tampering detection method, terminal device and storage medium | |
CN115375548A (en) | Super-resolution remote sensing image generation method, system, equipment and medium | |
CN114782355A (en) | Gastric cancer digital pathological section detection method based on improved VGG16 network | |
CN113807354B (en) | Image semantic segmentation method, device, equipment and storage medium | |
CN110428006A (en) | The detection method of computer generated image, system, device | |
CN116309612B (en) | Semiconductor silicon wafer detection method, device and medium based on frequency decoupling supervision | |
CN111862343A (en) | Three-dimensional reconstruction method, device and equipment and computer readable storage medium | |
CN117440104B (en) | Data compression reconstruction method based on target significance characteristics | |
CN113610856B (en) | Method and device for training image segmentation model and image segmentation | |
CN112001479B (en) | Processing method and system based on deep learning model and electronic equipment | |
CN112634126B (en) | Portrait age-reducing processing method, training method, device, equipment and storage medium | |
CN114387489A (en) | Power equipment identification method and device and terminal equipment | |
CN114821272A (en) | Image recognition method, image recognition system, image recognition medium, electronic device, and target detection model | |
CN110100263A (en) | Image rebuilding method and device | |
CN113239942A (en) | Image feature extraction method and device based on convolution operation and readable storage medium | |
CN112419249A (en) | Special clothing picture conversion method, terminal device and storage medium | |
CN117333740B (en) | Defect image sample generation method and device based on stable diffusion model | |
CN118470429A (en) | Equipment identification method and device based on picture fusion and regional interest pooling technology |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |