CN107845116B

CN107845116B - Method and apparatus for generating compression encoding of flat image

Info

Publication number: CN107845116B
Application number: CN201710960042.3A
Authority: CN
Inventors: 汪振华; 陈宇; 赵士超; 麻晓珍; 安山; 翁志
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Priority date: 2017-10-16
Filing date: 2017-10-16
Publication date: 2021-05-25
Anticipated expiration: 2037-10-16
Also published as: CN107845116A

Abstract

The invention provides a method and a device for generating compression coding of a plane image, which are beneficial to obtaining image compression coding with better feature expression and improving the processing efficiency. The method for generating compression encoding of a planar image of the present invention comprises: defining three groups of networks according to a Caffe framework of a deep learning framework; and defining loss functions for the three loss function layers; then, carrying out a first round of training and a second round of training to obtain a shaping result model, wherein the initialization weight of the first group of network training is the weight obtained after the third group of network training in the first round of training; and calculating the input plane image data by using the sizing result model so as to obtain the compression coding of the plane image.

Description

Method and apparatus for generating compression encoding of flat image

Technical Field

The present invention relates to the field of image feature calculation technology, and in particular, to a method and an apparatus for generating compression codes of a planar image.

Background

Image features are the basis for analyzing images. Such as clustering scenarios: the computer needs to put similar pictures together, not similar separately; the judgment basis is the image characteristics extracted based on the image content, the characteristic distance between similar images is small, and the characteristic distance between dissimilar images is long. On one hand, the good image characteristics can clearly define image repetition, similarity and dissimilarity through distance; on the other hand, the method can save storage overhead and is beneficial to efficient distance comparison.

Image compression encoding belongs to one of image features. The existing image features which exist in large quantity are expressed by floating point numbers, and the significance of compressing and encoding the features is that the feature expression force is not remarkably reduced, meanwhile, the feature storage cost can be remarkably reduced, and feature distance comparison is convenient.

In the prior art, schemes for generating compression codes of planar images mainly include a scheme based on hamming distance lsh (localization Sensitive hashing) and a scheme based on a machine learning method, and the latter mainly refers to a machine learning method with supervised learning. Which are described separately below.

LSH, also known as locality sensitive hashing, has the basic idea: after two adjacent data points in the original data space are subjected to the same mapping or projection transformation, the probability that the two data points are still adjacent in the new data space is very high, and the probability that non-adjacent data points are mapped to the same bucket is very low. Assuming that k bit-compressed binary codes are expected to be output, we design k hash functions { H }₁ ^g，H₂ ^g，H₃ ^g，…，

And (4) mapping original K floating-point bits to K bit bits, wherein the input of the hash function is a floating-point characteristic value, the output is 0 or 1, and the common method is to randomly extract K different bits from the K floating-point bits, apply the K hash functions to perform projection and conversion, and finally generate a bit string with the length of K bits.

The LSH scheme can be carried out in three steps, wherein the first step is to set mapping, namely, which feature dimensions in the feature vector need Hash function projection; setting a hash function, wherein a hyperplane threshold value is fixedly or randomly set for determining a compression coding value to be 0 or 1; and the third step is mapping, namely, executing a hash function on the feature dimension to be mapped to obtain compression coding in the form of a binary code of the image.

The scheme based on the mechanical method (supervised learning) can be divided into three steps, wherein the first step is to label training data; secondly, executing a training process and learning to obtain a hash function; and the third step is mapping, wherein the input image is directly converted to obtain the compression coding in the form of binary code through network forward operation.

For the above LSH scheme, because the hash function is a manual fixed or random function, it lacks generalization capability, and it belongs to a relatively learned algorithm, and the accuracy is poor. The supervised learning method needs to manually label a large amount of data, targets of the supervised learning method are positioned in classification, and the accuracy of the supervised learning method is relatively poor in the fields of repeated image identification and the like. The direct consequence of poor accuracy is that the feature expression of image compression coding is insufficient, making it difficult to define the similarity between images.

Disclosure of Invention

In view of the above, the present invention provides a method and an apparatus for generating a compression code of a planar image, which are helpful to obtain an image compression code with better feature expression capability and improve processing efficiency.

To achieve the above object, according to one aspect of the present invention, there is provided a method and apparatus for generating compression encoding of a flat image.

The method for generating compression encoding of a planar image of the present invention comprises: defining three groups of networks according to a Caffe framework of a deep learning framework; the loss functions defining the three loss function layers are in turn as follows: the expected mean value of each bit of code approaches 0.5, the expected coding quantization loss is minimum, the expected code has invariance to rotation scaling translation, and then a first round of training is carried out, wherein the first round of training comprises a first group of network training, a second group of network training and a third group of network training; in the first set of network training, training for a first loss function layer and a second loss function layer; in the second group of network training, initializing by using a weight file obtained after the first group of network training, and training aiming at the three loss function layers; in the third group of network training, initializing by using a weight file obtained after the second group of network training, modifying an input layer of the third group of network into only one data input layer, and training once to obtain a result model of the first round of training; modifying the loss function of the first loss function layer to: the average value of each bit is expected to approach 0; deleting the nonlinear activation unit between the fully-connected layer and the loss function from the definition of the three groups of networks, and then performing the first group of network training, the second group of network training and the third group of network training again to obtain a sizing result model, wherein the initialization weight of the first group of network training is the weight obtained after the third group of network training in the first round of training; and calculating the input plane image data by using the sizing result model so as to obtain the compression coding of the plane image.

Optionally, before the step of calculating the input planar image data using the sizing result model, the method further includes: converting the three-channel color image file into three two-dimensional unsigned integer matrixes; each channel corresponds to a matrix, each element of the matrix corresponds to each pixel of the image one by one, and the value of each element of the matrix is the pixel value of the pixel corresponding to the element in the channel corresponding to the matrix; and inputting the three two-dimensional unsigned integer matrixes serving as plane image data into the sizing result model.

Optionally, the loss function expression of the expected bit-coded mean value approaching 0.5 is as follows:

wherein B is the length of the bit code string,

training all for the eigenvalue at the k-th positionAnd (5) calculating the accumulated average on the data, wherein W is a hyperparameter of the loss function L and represents the network weight, and L (W) represents the loss function taking W as the hyperparameter.

Optionally, the expected encoding quantization loss minimum loss function is expressed as follows:

wherein b is_k0.5 (sign (F (x; W)) +1), the value of the sign function is-1 or 1, the F function is a non-linear projection function of the last fully-connected layer, according to the weight matrix and the node position x of this layer_kAnd outputting a characteristic value corresponding to the layer position k, wherein x represents a value corresponding to a network hidden layer node k, and M represents the total number of hidden layer nodes.

Optionally, the expected encoding is performed by using a loss function expression with invariance to rotation scaling translation as follows:

where L is the total number of images per image corresponding to rotation and translation, and M is the total number of training images. b_k,iCharacteristic values corresponding to images for translation or rotation, b_kThe characteristic value is the original image characteristic value.

According to another aspect of the present invention, an apparatus for generating a compression encoding of a planar image is provided.

The device for generating the compression coding of the plane image comprises a training module, a receiving module and a calculating module, wherein: the training module is configured to: defining three groups of networks according to a deep learning framework Caffe framework, wherein loss functions defining three loss function layers are as follows in sequence: the expected mean value of each bit of code approaches 0.5, the expected coding quantization loss is minimum, the expected code has invariance to rotation, scaling, translation and the like, then a first round of training is carried out, wherein the first round of training comprises a first group of network training, a second group of network training and a third group of network training, in the first group of network training, a first loss function layer and a second loss function layer are trained, an initialization weight file is provided by a Caffe frame, in the second group of network training, the weight file obtained after the first group of network training is used for initialization, the three loss function layers are trained, in the third group of network training, the weight file obtained after the second group of network training is used for initialization, the input layer of the third group of network is modified into only one data input layer, and the training is carried out once to obtain a result model of the first round of training, modifying the loss function of the first loss function layer to: the average value of each bit is expected to approach 0; deleting the nonlinear activation unit between the fully-connected layer and the loss function from the definition of the three groups of networks, and then performing the first group of network training, the second group of network training and the third group of network training again to obtain a sizing result model, wherein the initialization weight of the first group of network training is the weight obtained after the third group of network training in the first round of training; the receiving module is used for receiving the plane image data; the calculation module is used for calculating the plane image data by using the sizing result model so as to obtain the compression coding of the plane image.

Optionally, the system further comprises a conversion module, configured to convert the three-channel color image file into three two-dimensional unsigned integer matrices; each channel corresponds to a matrix, each element of the matrix corresponds to each pixel of the image one by one, and the value of each element of the matrix is the pixel value of the pixel corresponding to the element in the channel corresponding to the matrix; the receiving module is further configured to receive the three two-dimensional unsigned integer matrices as the planar image data.

According to still another aspect of the present invention, there is provided an electronic apparatus including: one or more processors; a storage device for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the methods of the present invention.

According to a further aspect of the invention, a computer-readable medium is provided, on which a computer program is stored which, when being executed by a processor, carries out the method of the invention.

According to the technical scheme of the invention, a specific two-round training method and specific contents of each round of training are provided, an end-to-end model is generated, and the image compression coding calculated by the model can better reflect the characteristics of the image. In addition, the technical scheme of the invention does not need data marking, saves the resource cost of a marker and improves the processing efficiency; because an end-to-end model is generated, the processing speed is higher when the image feature code is calculated.

Drawings

The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:

FIG. 1 is an exemplary system architecture diagram in which embodiments of the present invention may be employed;

FIG. 2 is a schematic diagram of the basic steps of generating a network model according to an embodiment of the invention;

FIG. 3 is a schematic diagram of a defined three group network in accordance with an embodiment of the present invention;

fig. 4 is a schematic diagram of the basic structure of an apparatus for compression encoding which generates a planar image according to an embodiment of the present invention;

fig. 5 is a schematic block diagram of a computer system suitable for use in implementing a terminal device or server of an embodiment of the invention.

Detailed Description

Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Fig. 1 shows an exemplary system architecture 100 to which the method of generating a compression encoding of a planar image or the apparatus of generating a compression encoding of a planar image of an embodiment of the present invention may be applied.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The

terminal devices

101, 102, 103 may have installed thereon various communication client applications, such as shopping-like applications, web browser applications, search-like applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only).

The

terminal devices

101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.

The server 105 may be a server providing various services, such as a background management server (for example only) providing support for shopping-like websites browsed by users using the

terminal devices

101, 102, 103. The backend management server may analyze and perform other processing on the received data such as the product information query request, and feed back a processing result (for example, target push information, product information — just an example) to the terminal device.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

FIG. 2 is a schematic diagram of the basic steps of generating a network model according to an embodiment of the present invention. The model is used for calculating compression coding of a plane image, and the content of the model comprises a definition file of a deep learning network and a network weight file obtained after training. As shown in fig. 2, the method mainly includes an initialization step and two rounds of training, which are described in detail below.

Step S21: an initial network layer definition. This step is an initialization step, which specifically includes: the basic network is a network defined by three groups of network definition files in a deep learning framework CaffetNet; and defining the loss functions of three loss function layers for the three groups of networks as follows in sequence: the average value of each bit of code is expected to approach 0.5, the coding quantization loss is minimum, and the code is expected to have invariance to rotation scaling translation and the like.

Step S22: a first round of training is performed. The current round of training comprises a first group to a third group of network training. And each group of network training modes learns in the backward propagation process according to a random gradient descending mode, the initial learning rate is 0.001, and the learning rate is reduced to 0.0001 after the iteration is carried out for 10000 times. In a first group of network training, training aiming at a first loss function layer and a second loss function layer, and loading an initialization weight from an existing (ImageNet) public training model; in the second group of network training, initializing by using a weight file obtained after the first group of network training, and training aiming at the three loss function layers; in the third group of network training, the weight file obtained after the second group of network training is used for initialization, the input layer of the third group of network is modified into only one data input layer, and the training is carried out once to obtain a result model of the first round of training.

Step S23: a second round of training is performed. In the training round, the loss function of the first loss function layer is modified into: the average value of each bit is expected to approach 0; and deleting the nonlinear activation units positioned between the fully-connected layer and the loss function from the definition of the three groups of networks. And then carrying out the first group of network training, the second group of network training and the third group of network training again to obtain a sizing result model, wherein the initialization weight of the first group of network training is the weight obtained after the third group of network training in the first round of training.

By performing calculation on the input planar image data using the sizing result model obtained after step S23, the compression encoding of the planar image can be obtained. The above method is further explained with reference to the drawings.

In the embodiment of the invention, three groups of network definition files (numbered N1, N2 and N3) are defined based on CaffeNet disclosed by the Internet as a basic network. Three layers of loss layers (L1, L2, L3) are added to the network corresponding to the three classes of objective functions. The embodiment of the invention provides that only the three layers are trained, and the learning rate of other layers is 0 during training.

The network input layer is a JPG or PNG format RGB three-channel color image file, data of each channel in RGB three channels after being read into the network is correspondingly read into a two-dimensional unsigned integer matrix (each integer representation range is 0-255, wherein 0 represents black, and 255 represents white), the matrix rows are high of images, the column numbers are wide of images, for example, a three-channel color image of 300 × 300 pixels is finally read into three unsigned integer matrices of 300 columns and 300 rows respectively in the network, the training aims to simplify the image from high-dimensional representation of 3 × 300 × 300 to low-dimensional compression coding representation, for example, 1024-dimensional floating point numbers, and can be converted into 1024-dimensional 0-1 bit code strings through a fixed mean value (the floating point feature produced by the network has the characteristic that the floating point value approaches to two ends, namely 1 and-1), i.e., compressed binary encoding, the method of applying the present embodiment may not be limited to generating only 1024-dimensional compressed encoding.

Fig. 3 is a schematic diagram of a defined three-group network in accordance with an embodiment of the present invention. Wherein N1, N2, N3 represent the defined three groups of networks, L1, L2, L3 represent the loss function layers.

In a first round of training, in a first phase, a first set of networks (N1) is trained on the loss function layers (L1 and L2) (e.g., about 57 ten thousand unlabeled pictures can be used for training, 4 graphics cards are used, the batch size (i.e., the number of images used per iteration) is 64, 5 ten thousand iterations are performed, so that each image is trained 22 times or more on average, 4 × 64 × 5/57 ═ 22), and the network initialization weights can be from cafemedel pre-trained on the public dataset of ImageNet (cafemedel is a network weight file produced based on the Caffe framework).

In the second phase, the second set of nets (N2) is initialized with net weight values in the first set of net trained weight files and trained simultaneously for three loss function layers (L1, L2, and L3). It should be noted that the second set of networks (N2) requires pairs of images to be trained when training the L3 layer, for example, when there is movement of an object in the image content, the images before movement and the image features after movement (the compressed coding features extracted by the final network) remain unchanged, and the pairs of images are the images before movement and the images after movement, and the input of the final formed network only accepts one image as input at a time, which is also the reason why the third set of networks (N3) is required in the third stage: and initializing by using a weight file trained by the second group of networks, and modifying an input layer on the network structure, namely only one data input layer is arranged, so that the final network structure is shaped. The third group does not require additional iterative training and only needs to be performed once.

The loss function for the first round of three sets of training is as follows:

l1: the average value of each bit is expected to approach 0.5;

b is the bit (bit code) string length,

is the cumulative average of the eigenvalues at the k-th bit over all training data, W is the hyperparameter of the loss function L, representing the network weight, and L (W) represents the loss function with W as the hyperparameter.

L2: the expected coding quantization loss is minimal;

wherein b is_k0.5 (sign (F (x; W)) +1), the value of the sign function is-1 or 1, the F function is a non-linear projection function of the last fully-connected layer, and the weight matrix (learned by training) and node position x of this layer are used as basis_kAnd outputting a characteristic value corresponding to the layer position k, wherein x represents a value corresponding to a node k of a hidden layer (a fully-connected layer where the extracted characteristic is located) of the network, and M represents the total number of nodes of the hidden layer (the fully-connected layer where the extracted characteristic is located).

L3: it is expected that the encoding is invariant to rotation, scaling, translation, etc.

The embodiment of the invention provides a mode of adopting two rounds of training with specific training contents. Some adjustments are made in the second round of training, specifically to adjust the loss function L1 to expect the mean of each bit code to approach 0 (0.5 in the first round of training), and the ReLU layer (i.e., the nonlinear activation unit) between the fully-connected layer and the loss function is removed from the network definition. The second round of training is performed in the same manner as the first round of training, except that the initial weight of the first stage is the weight output by the third stage of the first round of training.

The model of the shaping result obtained after the second round of training is an end-to-end model, and a plane image is input into the model to directly obtain the compressed binary code of the image. The following describes an apparatus for compression encoding for generating a planar image according to an embodiment of the present invention. Fig. 4 is a schematic diagram of the basic structure of an apparatus for compression encoding which generates a planar image according to an embodiment of the present invention. As shown in fig. 4, the apparatus 40 for generating a compression encoding of a planar image includes a training module, a receiving module, and a calculating module. The training module is used for obtaining the shaping result model according to the method. The receiving module is used for receiving the plane image data; the calculation module is used for calculating the plane image data by using the shaping result model so as to obtain the compression coding of the plane image.

The apparatus 40 may further include a conversion module (also shown in the figure) for converting the three-channel color image file into three two-dimensional unsigned integer matrices; each channel corresponds to a matrix, each element of the matrix corresponds to each pixel of the image one by one, and the value of each element of the matrix is the pixel value of the pixel corresponding to the element in the channel corresponding to the matrix. In this way, the receiving module is further configured to receive the three two-dimensional unsigned integer matrices as the plane image data.

Referring now to FIG. 5, shown is a block diagram of a computer system 500 suitable for use with a terminal device implementing an embodiment of the present invention. The terminal device shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.

As shown in fig. 5, the computer system 500 includes a Central Processing Unit (CPU)501 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)502 or a program loaded from a storage section 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data necessary for the operation of the system 500 are also stored. The CPU 501, ROM 502, and RAM 503 are connected to each other via a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.

The following components are connected to the I/O interface 505: an input portion 506 including a keyboard, a mouse, and the like; an output portion 507 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 508 including a hard disk and the like; and a communication section 509 including a network interface card such as a LAN card, a modem, or the like. The communication section 509 performs communication processing via a network such as the internet. The driver 510 is also connected to the I/O interface 505 as necessary. A removable medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 510 as necessary, so that a computer program read out therefrom is mounted into the storage section 508 as necessary.

In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 509, and/or installed from the removable medium 511. The computer program performs the above-described functions defined in the system of the present invention when executed by the Central Processing Unit (CPU) 501.

It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules described in the embodiments of the present invention may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor includes a training module, a receiving module, and a computing module. The names of these modules do not in some cases constitute a limitation to the module itself, and for example, a receiving module may also be described as a "module that receives planar image data".

As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by a device, enable the device to perform the methods of embodiments of the present invention, e.g., the methods performed in accordance with fig. 1.

According to the embodiment of the invention, a specific two-round training method and specific contents of each round of training are provided, and an end-to-end model is generated. According to the embodiment of the invention, the generated binary code has better error recall under the same precision error on 57 ten thousand repeated graph test sets (containing 1443 and 664 groups of marked group-route). Performance in the test set was measured by false recall at the same precision: the number of error calls is 74 under the condition that the bit length is 1024 and the precision is more than 90 percent, and the number of error calls under the condition of the prior art is generally more than 3000. That is to say, the image compression coding generated by the embodiment of the invention can better reflect the characteristics of the image. The calculation method of the precision comprises the following steps: after the images are clustered according to the similarity degree, the average value of the proportion of the group member number contained in each group of the test set to the member number which should be contained (namely the sum of the proportion of each group is divided by the total group number of the test set). The number of false recalls is the total number of non-test set members in each group after clustering. In addition, as can be seen from the description of the embodiment of the invention, data marking is not needed, the resource cost of a marker is reduced, and the processing efficiency is improved; because an end-to-end model is generated, the processing speed is higher when the image feature code is calculated.

The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method of generating compression encoding of a planar image, comprising:

defining three groups of networks according to a Caffe framework of a deep learning framework;

the loss functions defining the three loss function layers are in turn as follows: the expected mean value of each bit of code approaches 0.5, the expected coding quantization loss is minimum, the expected code has invariance to rotation scaling translation, and then a first round of training is carried out, wherein the first round of training comprises a first group of network training, a second group of network training and a third group of network training;

in the first set of network training, training for a first loss function layer and a second loss function layer;

in the second group of network training, initializing by using a weight file obtained after the first group of network training, and training aiming at the three loss function layers;

in the third group of network training, initializing by using a weight file obtained after the second group of network training, modifying an input layer of the third group of network into only one data input layer, and training once to obtain a result model of the first round of training;

modifying the loss function of the first loss function layer to: the average value of each bit is expected to approach 0; deleting the nonlinear activation unit between the fully-connected layer and the loss function from the definition of the three groups of networks, and then performing the first group of network training, the second group of network training and the third group of network training again to obtain a sizing result model, wherein the initialization weight of the first group of network training is the weight obtained after the third group of network training in the first round of training;

and calculating the input plane image data by using the sizing result model so as to obtain the compression coding of the plane image.

2. The method of claim 1, wherein the step of computing the input planar image data using the sizing result model is preceded by the step of:

converting the three-channel color image file into three two-dimensional unsigned integer matrixes; each channel corresponds to a matrix, each element of the matrix corresponds to each pixel of the image one by one, and the value of each element of the matrix is the pixel value of the pixel corresponding to the element in the channel corresponding to the matrix;

and inputting the three two-dimensional unsigned integer matrixes serving as plane image data into the sizing result model.

3. The method of claim 1 wherein the expected bit-wise encoded mean approaches a 0.5 penalty function expression as follows:

wherein B is the length of the bit code string,

is the cumulative average of the eigenvalues at the k-th bit over all training data, W is a loss function L₁Represents the network weight, L₁(W) represents a loss function with W as a hyperparameter.

4. The method of claim 1, wherein the loss function with the minimum expected coding quantization loss is expressed as follows:

wherein b is_k0.5 (sign (F (x; W)) +1), the value of the sign function is-1 or 1, the F function is a non-linear projection function of the last fully-connected layer, according to the weight matrix and the node position x of this layer_kOutputting a characteristic value corresponding to the layer position k, wherein x represents a value corresponding to a network hidden layer node k, M represents the total number of hidden layer nodes, and W is a loss function L₂Is determined.

5. The method of claim 1, wherein the expected encoding is expressed as a loss function invariant to rotational scaling translations as follows:

where L is the total number of images per image corresponding to rotation and translation, M is the total number of training images, b_k,iCharacteristic values corresponding to images for translation or rotation, b_kW is the loss function L for the characteristic value of the original image₃Is determined.

6. An apparatus for generating a compression encoding of a planar image, comprising a training module, a receiving module, and a computing module, wherein:

the training module is configured to:

three groups of networks are defined according to a deep learning framework Caffe framework,

the loss functions defining the three loss function layers are in turn as follows: the average value of each bit of code is expected to approach 0.5, the quantization loss of the code is expected to be minimum, the code is expected to have invariance to the rotation scaling translation, and then a first round of training is carried out, wherein the first round of training comprises a first group of network training, a second group of network training and a third group of network training,

in the first set of network training, training is performed for a first loss function layer and a second loss function layer,

in the second group of network training, initializing by using a weight file obtained after the first group of network training, and training aiming at the three loss function layers,

in the third group of network training, the weight file obtained after the second group of network training is used for initialization, the input layer of the third group of network is modified into only one data input layer, and the training is carried out once to obtain a result model of the first round of training,

the receiving module is used for receiving the plane image data;

the calculation module is used for calculating the plane image data by using the sizing result model so as to obtain the compression coding of the plane image.

7. The apparatus of claim 6,

the three-channel color image file conversion module is used for converting the three-channel color image file into three two-dimensional unsigned integer matrixes; each channel corresponds to a matrix, each element of the matrix corresponds to each pixel of the image one by one, and the value of each element of the matrix is the pixel value of the pixel corresponding to the element in the channel corresponding to the matrix;

the receiving module is further configured to receive the three two-dimensional unsigned integer matrices as the planar image data.

8. The apparatus of claim 6 wherein the expected bit-coded mean value is expressed as a loss function of approximately 0.5 as follows:

wherein B is the length of the bit code string,

9. The apparatus of claim 6, wherein the expected encoding quantization loss minimization loss function is expressed as follows:

wherein b is_k0.5 (sign (F (x; W)) +1), the value of the sign function is-1 or 1, the F function is a non-linear projection function of the last fully-connected layer, according to the weight matrix and the node position x of this layer_kOutputting a characteristic value corresponding to the layer position k, wherein x represents a value corresponding to a network hidden layer node k, and M represents a hidden layerTotal number of nodes, W is a loss function L₂Is determined.

10. The apparatus of claim 6, wherein the expected encoding is expressed as a loss function invariant to rotation scaling translations as follows:

11. An electronic device, comprising:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-5.

12. A computer-readable medium, on which a computer program is stored, which program, when being executed by a processor, is adapted to carry out the method of any one of claims 1-5.