CN107845116A

CN107845116A - The method and apparatus for generating the compressed encoding of plane picture

Info

Publication number: CN107845116A
Application number: CN201710960042.3A
Authority: CN
Inventors: 汪振华; 陈宇; 赵士超; 麻晓珍; 安山; 翁志
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Priority date: 2017-10-16
Filing date: 2017-10-16
Publication date: 2018-03-27
Anticipated expiration: 2037-10-16
Also published as: CN107845116B

Abstract

The present invention provides a kind of method and apparatus for the compressed encoding for generating plane picture, helps to obtain the preferable image compression encoding of feature representation power, and improve treatment effeciency.The method of the compressed encoding of the generation plane picture of the present invention includes：Three groups of networks are defined according to deep learning framework Caffe frameworks；And define the loss function of three loss function layers；Then first round training and the second wheel training are carried out to obtain results model of shaping, wherein the initialization weight of first group of network training is the weight obtained in first round training after the 3rd group of network training；The planar image data of input is calculated using the sizing results model, so as to obtain the compressed encoding of the plane picture.

Description

The method and apparatus for generating the compressed encoding of plane picture

Technical field

The present invention relates to box counting algorithm technical field, a kind of particularly compressed encoding for generating plane picture Method and apparatus.

Background technology

Characteristics of image is the basis for analyzing image.For example cluster scene：Computer needs to put together in similar picture, Dissimilar separates；Its basis for estimation is the characteristics of image that image content-based extracts, the spy between close picture Sign characteristic distance between small, dissimilar image is remote.A kind of good characteristics of image on the one hand can be clear by distance Define image repetition, phase Sihe dissmilarity；On the other hand storage overhead can also be saved and compared beneficial to efficient distance.

Image compression encoding belongs to one kind of characteristics of image.Existing a large amount of existing characteristics of image are floating number expression, and The meaning of compressed encoding feature can significantly reduce characteristic storage cost and conveniently while being not significantly reduce feature representation power Spy carries out sign distance and compared.

In the prior art, generating the scheme of the compressed encoding of plane picture mainly has based on Hamming distance LSH (Locality Sensitive Hashing) scheme and scheme based on machine learning method, the latter refer mainly to the engineering of supervised learning Learning method.Introduced individually below.

LSH is also known as local sensitivity Hash, and its basic thought is：Two consecutive number strong points in original data space are passed through After identical mapping or projective transformation, the two data points probability still adjacent in new data space is very big, without phase Adjacent data point is mapped to the probability very little of same bucket.Assuming that k bit positions compression binary-coding of anticipated output, Wo Mentong Cross k hash function { H of design₁ ^g, H₂ ^g, H₃ ^g...,, the wherein input of hash function is floating point features value, export as 0 or 1, by original K floating-point bit mapping to k bit position, it is common practice to randomly select k different positions, application from K floating-point position Above-mentioned k hash function carries out projection dress and changed, and ultimately produces the bit strings of a k bit length.

LSH schemes can be carried out in three steps, and the first step is to set mapping, that is, set which feature dimensions in characteristic vector to need to breathe out Uncommon Function Projective；Second step is to set hash function, is used to determine compressed encoding wherein fixing or being randomly provided hyperplane threshold value It is worth for 0 or 1；3rd step is mapping, that is, treats mappings characteristics dimension and perform pressure of the hash function in the form of the two-value code for obtaining image Reduce the staff code.

Scheme based on engineering method (supervised learning) can be divided into three steps, and the first step is to be trained data mark； Second step is to perform training process, and study obtains hash function；3rd step is to be mapped, wherein by before network to computing, Input picture is directly converted to obtain the compressed encoding of two-value code form.

For above-mentioned LSH schemes, because hash function is artificial fixation or random function, lack generalization ability, and It belongs to the algorithm of relative study, and precision is poor.Above-mentioned supervised learning method needs artificial mark mass data, and Its target is positioned at classification, relatively poor in field, its precision such as multigraph discriminatings.The bad direct result of precision is to cause image The feature representation power of compressed encoding is inadequate, so as to be difficult to define the similarity between image.

The content of the invention

In view of this, the present invention provides a kind of method and apparatus for the compressed encoding for generating plane picture, helps to obtain The preferable image compression encoding of feature representation power, and improve treatment effeciency.

To achieve the above object, according to an aspect of the invention, there is provided a kind of compressed encoding for generating plane picture Method and apparatus.

The method of the compressed encoding of the generation plane picture of the present invention includes：Determine according to deep learning framework Caffe frameworks Adopted three groups of networks；The loss function for defining three loss function layers is as follows successively：It is expected that every encoded mean value convergence 0.5, expection Coded quantization loss reduction, expected coding possess consistency to rotation scaling translation, then carry out first round training, including First group of network training, second group of network training and the 3rd group of network training；In first group of network training, for First-loss function layer and the second loss function layer are trained；In second group of network training, first group of network is used The weight file obtained after training is initialized, and is trained for three loss function layers；At described 3rd group In network training, initialized using the weight file obtained after second group of network training, and by the input of the 3rd group of network Layer is revised as an only data input layer, and trains once to obtain the results model of first round training；By first loss The loss function of function layer is revised as：It is expected that every encoded mean value convergence 0；And delete position from the definition of three groups of networks In connecting the nonlinear activation unit between layer and loss function entirely, then carry out again first group of network training, second group Network training and the 3rd group of network training are to obtain results model of shaping, wherein the initialization weight of first group of network training It is the weight obtained in first round training after the 3rd group of network training；Plane picture using the sizing results model to input Data are calculated, so as to obtain the compressed encoding of the plane picture.

Alternatively, before the step of being calculated using the sizing results model the planar image data of input, also Including：It is three two-dimentional signless integer matrixes by Three Channel Color converting of image file；Wherein each corresponding square of passage Battle array, each element of matrix correspond with each pixel of image respectively, and the value of each element of matrix is corresponding to the matrix The pixel value of pixel corresponding to the element in passage；Described three two-dimentional signless integer matrixes are defeated as planar image data Enter the sizing results model.

Alternatively, the loss function expression formula of the every encoded mean value convergence 0.5 of the expection is as follows：

Wherein, B is bit code string length,For cumulative mean of the characteristic value on kth position on all training datas, W For loss function L hyper parameter, network weight is represented, L (W) represents the loss function using W as hyper parameter.

Alternatively, the loss function expression formula of the expected coded quantization loss reduction is as follows：

Wherein b_k=0.5 × (sign (F (x；W))+1), the value of sign functions is -1 or 1, F function are that last layer connects entirely The non-linear projection function of layer is connect, according to the weight matrix of this layer and node location x_kCharacteristic value corresponding to this layer of position k is exported, Wherein x represents to be worth corresponding to network concealed node layer k, and M represents hidden layer node total number.

Alternatively, it is as follows to encode the loss function expression formula for possessing consistency to rotation scaling translation for the expection：

Wherein L is every figure corresponding rotation and the total number of images of translation, and M is training image sum.b_k,iTo translate or rotating Image corresponding to characteristic value, b_kFor artwork characteristic value.

A kind of according to another aspect of the invention, it is proposed that device for the compressed encoding for generating plane picture.

The device of the compressed encoding of the generation plane picture of the present invention includes training module, receiving module and calculates mould Block, wherein：The training module is used for：Three groups of networks are defined according to deep learning framework Caffe frameworks, define three loss letters Several layers of loss function is as follows successively：It is expected that every encoded mean value convergence 0.5, expected coded quantization loss reduction, expected coding Possess consistency to rotation scaling translation etc., first round training is then carried out, including first group of network training, the second networking Network training and the 3rd group of network training, in first group of network training, lost for first-loss function layer and second Function layer is trained, and initialization weight file is provided by Caffe frameworks, in second group of network training, uses first The weight file obtained after group network training is initialized, and is trained for three loss function layers, described In 3rd group of network training, initialized using the weight file obtained after second group of network training, and by the 3rd group of network Input layer be revised as an only data input layer, and train once to obtain the results model of first round training, by first The loss function of individual loss function layer is revised as：It is expected that every encoded mean value convergence 0；And from the definition of three groups of networks Delete positioned at nonlinear activation unit between layer and loss function is connected entirely, then carry out again first group of network training, Second group of network training and the 3rd group of network training are to obtain results model of shaping, wherein first group of network training is initial It is the weight obtained in first round training after the 3rd group of network training to change weight；The receiving module is used for receiving plane picture number According to；The computing module is used to calculate the planar image data using the sizing results model, so as to be somebody's turn to do The compressed encoding of plane picture.

Alternatively, in addition to modular converter, for being that three two dimensions are whole without symbol by Three Channel Color converting of image file Matrix number；The wherein corresponding matrix of each passage, each pixel of each element of matrix respectively with image correspond, square The value of each element of battle array is the pixel value of pixel corresponding to the element in passage corresponding to the matrix；The receiving module is additionally operable to Described three two-dimentional signless integer matrixes are received as the planar image data.

According to another aspect of the invention, there is provided a kind of electronic equipment, including：One or more processors；Storage dress Put, for storing one or more programs, when one or more of programs are by one or more of computing devices so that One or more of processors realize method of the present invention.

According to another aspect of the invention, there is provided a kind of computer-readable medium, be stored thereon with computer program, institute State and method of the present invention is realized when program is executed by processor.

Technique according to the invention scheme, propose specific two-wheeled training method and often take turns the certain content of training, produce Model is end to end gone out, the characteristics of image compression encoding calculated with the model can reflect image better.In addition Technical scheme marks without data, and section has lacked mark people resources costs and improved treatment effeciency；It is because caused It is model end to end, so also having higher processing speed when calculating characteristics of image code.

Brief description of the drawings

Accompanying drawing is used to more fully understand the present invention, does not form inappropriate limitation of the present invention.Wherein：

Fig. 1 is that the embodiment of the present invention can apply to exemplary system architecture figure therein；

Fig. 2 is the schematic diagram of the basic step of generation network model according to embodiments of the present invention；

Fig. 3 is the schematic diagram of three groups of networks of definition according to embodiments of the present invention；

Fig. 4 is the signal of the basic structure of the device of the compressed encoding of generation plane picture according to embodiments of the present invention Figure；

Fig. 5 is adapted for the structural representation for realizing the terminal device of the embodiment of the present invention or the computer system of server Figure.

Embodiment

The one exemplary embodiment of the present invention is explained below in conjunction with accompanying drawing, including the various of the embodiment of the present invention Details should think them only exemplary to help understanding.Therefore, those of ordinary skill in the art should recognize Arrive, various changes and modifications can be made to the embodiments described herein, without departing from scope and spirit of the present invention.Together Sample, for clarity and conciseness, the description to known function and structure is eliminated in following description.

Fig. 1 shows the method or generation plane of the compressed encoding for the generation plane picture that can apply the embodiment of the present invention The exemplary system architecture 100 of the device of the compressed encoding of image.

As shown in figure 1, system architecture 100 can include terminal device 101,102,103, network 104 and server 105. Network 104 between terminal device 101,102,103 and server 105 provide communication link medium.Network 104 can be with Including various connection types, such as wired, wireless communication link or fiber optic cables etc..

User can be interacted with using terminal equipment 101,102,103 by network 104 with server 105, to receive or send out Send message etc..Various telecommunication customer end applications, such as the application of shopping class, net can be installed on terminal device 101,102,103 (merely illustrative) such as the application of page browsing device, searching class application, JICQ, mailbox client, social platform softwares.

Terminal device 101,102,103 can have a display screen and a various electronic equipments that supported web page browses, bag Include but be not limited to smart mobile phone, tablet personal computer, pocket computer on knee and desktop computer etc..

Server 105 can be to provide the server of various services, such as utilize terminal device 101,102,103 to user The shopping class website browsed provides the back-stage management server (merely illustrative) supported.Back-stage management server can be to receiving To the data such as information query request analyze etc. processing, and by result (such as target push information, product letter Breath -- merely illustrative) feed back to terminal device.

It should be understood that the number of the terminal device, network and server in Fig. 1 is only schematical.According to realizing need Will, can have any number of terminal device, network and server.

Fig. 2 is the schematic diagram of the basic step of generation network model according to embodiments of the present invention.The model is used for calculating The compressed encoding of plane picture, the definition file including deep learning network and the network weight obtained after training are literary in terms of content Part.As shown in Fig. 2 this method mainly includes the step of initialization and two-wheeled training, illustrate in detail below.

Step S21：Original net network layers define.The step of this step is initialization, specifically includes：Basic network is depth Practise the network of three groups of net definitions document definitions in framework CaffetNet；And to this three groups of net definitions, three loss function The loss function of layer is as follows successively：It is expected that every encoded mean value convergence 0.5, phase coded quantization loss reduction, it is contemplated that coding is to rotation Turn scaling translation etc. and possess consistency.

Step S22：Carry out first round training.Epicycle training includes first to the 3rd group of network training.Every group of network training Mode presses stochastic gradient descent mode, and in back-propagating process learning, initialization learning rate is 0.001, is iterated to 10000 times Learning rate drops to 0.0001 afterwards.In first group of network training, enter for first-loss function layer and the second loss function layer Row training, weight is initialized from existing (ImageNet) open training pattern loading；In second group of network training, first is used The weight file obtained after group network training is initialized, and is trained for above three loss function layer；The 3rd In group network training, initialized using the weight file obtained after second group of network training, and by the defeated of the 3rd group of network Enter layer and be revised as an only data input layer, and train once to obtain the results model of first round training.

Step S23：Carry out the second wheel training.In epicycle training, first the loss function of first loss function layer is repaiied It is changed to：It is expected that every encoded mean value convergence 0；And delete to be located at from the definition of above-mentioned three groups of networks and connect layer and loss function entirely Between nonlinear activation unit.Then above-mentioned first group of network training, second group of network training and the 3rd group are carried out again Network training is to obtain results model of shaping, wherein the initialization weight of first group of network training is the 3rd group in first round training The weight obtained after network training.

The planar image data of input is calculated using the sizing results model obtained after step S23, it is possible to Obtain the compressed encoding of the plane picture.The above method is described further below in conjunction with accompanying drawing.

In the embodiment of the present invention, based on network based on CaffeNet disclosed in internet, three groups of net definitions are defined File (numbering N1, N2, N3).It is last that the corresponding tertiary target function of three layers of loss layer (L1, L2, L3) is added to network.This hair Bright embodiment proposition only trains this three layers, and other layers learning rate in training is 0.

Network input layer is JPG or PNG format RGB Three Channel Color image files, is read in every in RGB triple channels after network The data of individual passage are correspondingly read into a two-dimentional signless integer matrix, and (each integer representation scope is [0-255], wherein 0 Represent black, 255 be white) in, matrix line number is the height of image, and columns is the width of image, such as the threeway of 300 × 300 pixels Road coloured image can finally be read into network three to be had in the signless integer matrix of the row of 300 row 300 respectively, the mesh of training Be that higher-dimension of the image from 3 × 300 × 300 is represented that the compressed encoding for being reduced to a kind of low-dimensional represents, such as 1024 dimension floating numbers And (floating point features for the network output being previously mentioned by the present embodiment has floating point values convergence two by a fixed average The characteristics of end i.e. 1 and -1) 1024 dimension 0-1 bit sequences can be translated into, that is, binary-coding is compressed, but using the present embodiment Method can be not limited to only produce the compressed encoding of 1024 dimensions.

Fig. 3 is the schematic diagram of three groups of networks of definition according to embodiments of the present invention.Wherein N1, N2, N3 represent definition Three groups of networks, L1, L2, L3 represent loss function layer.

In the first round trains, in the first phase, first group of network (N1) is carried out for loss function layer (L1 and L2) Training (such as about 570,000 can be used to be trained without mark picture, with 4 video cards, batch is (used in i.e. each iteration Image number) size 64, iteration 50,000 times, so average every figure can be trained to more than 22 times, 4 × 64 × 5/57=22), net Network initialization weight can use the pre- caffemodel instructed on the public data collection for carrying out comfortable ImageNet, (caffemodel is based on the network weight file produced under Caffe frameworks).

In second stage, second group of network is initialized with the network weight weight values in the weight file of first group of network training (N2), and for three loss function layers (L1, L2 and L3) it is trained simultaneously.It should be noted that because second group of network (N2) paired image is needed to be trained when training loss function L3 layers, such as in picture material, when object has mobile, Image before movement and it is mobile after characteristics of image (the compressed encoding feature arrived through final network abstraction) keep constant, now into To image be it is mobile before image and it is mobile after image, and the input of the network of final molding only receives a figure every time As to input, this is also the reason for needs in the phase III with the 3rd group of network (N3)：The power trained with second group of network Weight file initialization, while an input layer i.e. only data input layer is changed in network structure, final network structure is shaped. Therefore the 3rd group does not need additional iterations training, need to only perform once.

The loss function of three groups of training of the first round is as follows：

L1：It is expected that every encoded mean value convergence 0.5；

B is bit (bit code) string length,For cumulative mean of the characteristic value on kth position on all training datas, W For loss function L hyper parameter, network weight is represented, L (W) represents the loss function using W as hyper parameter.

L2：It is expected that coded quantization loss reduction；

Wherein b_k=0.5 × (sign (F (x；W))+1), the value of sign functions is -1 or 1, F function are that last layer connects entirely The non-linear projection function of layer is connect, according to the weight matrix (training learns) and node location x of this layer_kExport this layer of k pairs of position The characteristic value answered, wherein x represent to be worth corresponding to network concealed layer (the full articulamentum where extraction feature) node k, and M represents to hide Layer (the full articulamentum where extraction feature) node total number.

L3：It is expected that coding is to rotating, scaling, the operation such as translating and possess consistency.

The embodiment of the present invention is proposed by the way of the two-wheeled training with specific training content.Done in the second wheel training Some adjustment, specifically loss function L1 is adjusted to be expected every encoded mean value convergence 0 (being 0.5 in first round training), and And remove the ReLU layers between full articulamentum and loss function (i.e. nonlinear activation unit) from net definitions.Second wheel The flow of training and first round training are identical, are equally the training for carrying out for above-mentioned first to phase III, only the first stage Initialization weight be the first round training phase III output weight.

The second sizing results model that obtains afterwards of wheel training is model end to end, to the mode input plane picture just The compression binary-coding of the image can be directly obtained.Below again to the compressed encoding for generating plane picture of the embodiment of the present invention Device be illustrated.Fig. 4 is the basic structure of the device of the compressed encoding of generation plane picture according to embodiments of the present invention Schematic diagram.As shown in figure 4, the device 40 of the compressed encoding of generation plane picture includes training module, receiving module, Yi Jiji Calculate module.The method that training module is used for as described above obtains above-mentioned sizing results model.Receiving module is used to receive Planar image data；Computing module is used to calculate the planar image data using above-mentioned sizing results model, so as to To the compressed encoding of the plane picture.

Above-mentioned device 40 may also include modular converter (be shown in figure), for by Three Channel Color converting of image file For three two-dimentional signless integer matrixes；The wherein corresponding matrix of each passage, each element of matrix respectively with image Each pixel corresponds, and the value of each element of matrix is the pixel of pixel corresponding to the element in passage corresponding to the matrix Value.So, above-mentioned receiving module is additionally operable to receive above three two dimension signless integer matrix as above-mentioned plane picture number According to.

Below with reference to Fig. 5, it illustrates suitable for for realizing the computer system 500 of the terminal device of the embodiment of the present invention Structural representation.Terminal device shown in Fig. 5 is only an example, to the function of the embodiment of the present invention and should not use model Shroud carrys out any restrictions.

As shown in figure 5, computer system 500 includes CPU (CPU) 501, it can be read-only according to being stored in Program in memory (ROM) 502 or be loaded into program in random access storage device (RAM) 503 from storage part 508 and Perform various appropriate actions and processing.In RAM 503, also it is stored with system 500 and operates required various programs and data. CPU 501, ROM 502 and RAM 503 are connected with each other by bus 504.Input/output (I/O) interface 505 is also connected to always Line 504.

I/O interfaces 505 are connected to lower component：Importation 506 including keyboard, mouse etc.；Penetrated including such as negative electrode The output par, c 507 of spool (CRT), liquid crystal display (LCD) etc. and loudspeaker etc.；Storage part 508 including hard disk etc.； And the communications portion 509 of the NIC including LAN card, modem etc..Communications portion 509 via such as because The network of spy's net performs communication process.Driver 510 is also according to needing to be connected to I/O interfaces 505.Detachable media 511, such as Disk, CD, magneto-optic disk, semiconductor memory etc., it is arranged on as needed on driver 510, in order to read from it Computer program be mounted into as needed storage part 508.

Especially, according to embodiment disclosed by the invention, may be implemented as counting above with reference to the process of flow chart description Calculation machine software program.For example, embodiment disclosed by the invention includes a kind of computer program product, it includes being carried on computer Computer program on computer-readable recording medium, the computer program include the program code for being used for the method shown in execution flow chart. In such embodiment, the computer program can be downloaded and installed by communications portion 509 from network, and/or from can Medium 511 is dismantled to be mounted.When the computer program is performed by CPU (CPU) 501, system of the invention is performed The above-mentioned function of middle restriction.

It should be noted that the computer-readable medium shown in the present invention can be computer-readable signal media or meter Calculation machine readable storage medium storing program for executing either the two any combination.Computer-readable recording medium for example can be --- but not Be limited to --- electricity, magnetic, optical, electromagnetic, system, device or the device of infrared ray or semiconductor, or it is any more than combination.Meter The more specifically example of calculation machine readable storage medium storing program for executing can include but is not limited to：Electrical connection with one or more wires, just Take formula computer disk, hard disk, random access storage device (RAM), read-only storage (ROM), erasable type and may be programmed read-only storage Device (EPROM or flash memory), optical fiber, portable compact disc read-only storage (CD-ROM), light storage device, magnetic memory device, Or above-mentioned any appropriate combination.In the present invention, computer-readable recording medium can any include or store journey The tangible medium of sequence, the program can be commanded the either device use or in connection of execution system, device.And at this In invention, computer-readable signal media can include in a base band or as carrier wave a part propagation data-signal, Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including but unlimited In electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be that computer can Any computer-readable medium beyond storage medium is read, the computer-readable medium, which can send, propagates or transmit, to be used for By instruction execution system, device either device use or program in connection.Included on computer-readable medium Program code can be transmitted with any appropriate medium, be included but is not limited to：Wirelessly, electric wire, optical cable, RF etc., or it is above-mentioned Any appropriate combination.

Flow chart and block diagram in accompanying drawing, it is illustrated that according to the system of various embodiments of the invention, method and computer journey Architectural framework in the cards, function and the operation of sequence product.At this point, each square frame in flow chart or block diagram can generation The part of one module of table, program segment or code, a part for above-mentioned module, program segment or code include one or more For realizing the executable instruction of defined logic function.It should also be noted that some as replace realization in, institute in square frame The function of mark can also be with different from the order marked in accompanying drawing generation.For example, two square frames succeedingly represented are actual On can perform substantially in parallel, they can also be performed in the opposite order sometimes, and this is depending on involved function.Also It is noted that the combination of each square frame and block diagram in block diagram or flow chart or the square frame in flow chart, can use and perform rule Fixed function or the special hardware based system of operation are realized, or can use the group of specialized hardware and computer instruction Close to realize.

Being described in module involved in the embodiment of the present invention can be realized by way of software, can also be by hard The mode of part is realized.Described module can also be set within a processor, for example, can be described as：A kind of processor bag Include training module, receiving module and computing module.Wherein, the title of these modules is not formed to this under certain conditions The restriction of module in itself, for example, receiving module is also described as " module of receiving plane view data ".

As on the other hand, present invention also offers a kind of computer-readable medium, the computer-readable medium can be Included in equipment described in above-described embodiment；Can also be individualism, and without be incorporated the equipment in.Above-mentioned calculating Machine computer-readable recording medium carries one or more program, when said one or multiple programs are performed by the equipment, makes Obtain the equipment and be able to carry out method described in the embodiment of the present invention, such as according to the method performed by Fig. 1.

According to embodiments of the present invention, propose specific two-wheeled training method and often take turns the certain content of training, output Model end to end.According to embodiments of the present invention, caused binary-coding repeats figure test sets (containing 1443 totally 664 570,000 Group mark ground-truth) under same precision error mistake recall it is more excellent.Weighing apparatus is recalled by the mistake under same precision Measure the performance in test set：In bit length=1024, and it is 74 to call number together more than mistake under 90% precision, and prior art is herein Under the conditions of mistake call together number typically more than 3000.That is image compression encoding caused by the embodiment of the present invention can be relatively good Ground reflects the characteristics of image.The computational methods of above-mentioned precision are：After image is clustered as similarity degree contained by each packet of test set The average value (i.e. total group of number of each group ratio sum divided by test set) of the ratio of group member's number and the number of members that should contain.It is wrong It is that non-test set member is total in each group after clustering to recall number by mistake.It can be seen that in addition from the description of the embodiment of the present invention Marked without data, section has lacked mark people resources costs and improved treatment effeciency；It is model end to end because caused, institute Also to have higher processing speed when calculating characteristics of image code.

Above-mentioned embodiment, does not form limiting the scope of the invention.Those skilled in the art should be bright It is white, depending on design requirement and other factors, various modifications, combination, sub-portfolio and replacement can occur.It is any Modifications, equivalent substitutions and improvements made within the spirit and principles in the present invention etc., should be included in the scope of the present invention Within.

Claims

A kind of 1. method for the compressed encoding for generating plane picture, it is characterised in that including：

Three groups of networks are defined according to deep learning framework Caffe frameworks；

The loss function for defining three loss function layers is as follows successively：It is expected that every encoded mean value convergence 0.5, expected coded quantization Loss reduction, expected coding possess consistency to rotation scaling translation, first round training are then carried out, including the first networking Network training, second group of network training and the 3rd group of network training；

In first group of network training, it is trained for first-loss function layer and the second loss function layer；

In second group of network training, initialized using the weight file obtained after first group of network training, and pin Three loss function layers are trained；

In the 3rd group of network training, initialized using the weight file obtained after second group of network training, and will The input layer of 3rd group of network is revised as an only data input layer, and trains once to obtain the result mould of first round training Type；

The loss function of first loss function layer is revised as：It is expected that every encoded mean value convergence 0；And from three networking Deleted in the definition of network positioned at the nonlinear activation unit connected entirely between layer and loss function, then carry out described first group again Network training, second group of network training and the 3rd group of network training are to obtain results model of shaping, wherein first group of network instruction Experienced initialization weight is the weight obtained in first round training after the 3rd group of network training；

The planar image data of input is calculated using the sizing results model, so as to obtain the compression of the plane picture Coding.
2. according to the method for claim 1, it is characterised in that the plane picture using the sizing results model to input Before the step of data are calculated, in addition to：

It is three two-dimentional signless integer matrixes by Three Channel Color converting of image file；Wherein each corresponding square of passage Battle array, each element of matrix correspond with each pixel of image respectively, and the value of each element of matrix is corresponding to the matrix The pixel value of pixel corresponding to the element in passage；

The sizing results model is inputted using described three two-dimentional signless integer matrixes as planar image data.
3. according to the method for claim 1, it is characterised in that the loss letter of the every encoded mean value convergence 0.5 of expection Number expression formula is as follows：

<mrow> <munder> <mrow> <mi>m</mi> <mi>i</mi> <mi>n</mi> </mrow> <mi>W</mi> </munder> <mi>L</mi> <mrow> <mo>(</mo> <mi>W</mi> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mo>&Sigma;</mo> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>B</mi> </munderover> <mo>|</mo> <mo>|</mo> <mrow> <mo>(</mo> <msub> <mover> <mi>b</mi> <mo>&OverBar;</mo> </mover> <mi>k</mi> </msub> <mo>-</mo> <mn>0.5</mn> <mo>)</mo> </mrow> <mo>|</mo> <msup> <mo>|</mo> <mn>2</mn> </msup> </mrow>

Wherein, B is bit code string length,For cumulative mean of the characteristic value on kth position on all training datas, W is damage Function L hyper parameter is lost, represents network weight, L (W) represents the loss function using W as hyper parameter.
4. according to the method for claim 1, it is characterised in that the loss function table of the expected coded quantization loss reduction It is as follows up to formula：

<mrow> <munder> <mrow> <mi>m</mi> <mi>i</mi> <mi>n</mi> </mrow> <mi>W</mi> </munder> <mi>L</mi> <mrow> <mo>(</mo> <mi>W</mi> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mo>&Sigma;</mo> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>M</mi> </munderover> <mo>|</mo> <mo>|</mo> <mrow> <mo>(</mo> <msub> <mi>b</mi> <mi>k</mi> </msub> <mo>-</mo> <mn>0.5</mn> <mo>)</mo> </mrow> <mo>-</mo> <mi>F</mi> <mrow> <mo>(</mo> <msub> <mi>x</mi> <mi>k</mi> </msub> <mo>,</mo> <mi>W</mi> <mo>)</mo> </mrow> <mo>|</mo> <msup> <mo>|</mo> <mn>2</mn> </msup> </mrow>

Wherein b_k=0.5 × (sign (F (x；W))+1), the value of sign functions is -1 or 1, F function are the full articulamentum of last layer Non-linear projection function, according to the weight matrix of this layer and node location x_kCharacteristic value corresponding to this layer of position k is exported, wherein X represents to be worth corresponding to network concealed node layer k, and M represents hidden layer node total number.
5. according to the method for claim 1, it is characterised in that the expected coding possesses consistency to rotation scaling translation Loss function expression formula it is as follows：

<mrow> <munder> <mrow> <mi>min</mi> <mi> </mi> <mi>L</mi> <mrow> <mo>(</mo> <mi>W</mi> <mo>)</mo> </mrow> </mrow> <mi>W</mi> </munder> <mo>=</mo> <munderover> <mo>&Sigma;</mo> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>M</mi> </munderover> <munderover> <mo>&Sigma;</mo> <mi>i</mi> <mi>L</mi> </munderover> <mo>|</mo> <mo>|</mo> <msub> <mi>b</mi> <mrow> <mi>k</mi> <mo>,</mo> <mi>i</mi> </mrow> </msub> <mo>-</mo> <msub> <mi>b</mi> <mi>k</mi> </msub> <mo>|</mo> <mo>|</mo> </mrow>

Wherein L is every figure corresponding rotation and the total number of images of translation, and M is training image sum.b_k,iFor the figure for translating or rotating The characteristic value as corresponding to, b_kFor artwork characteristic value.
6. it is a kind of generate plane picture compressed encoding device, it is characterised in that including training module, receiving module and Computing module, wherein：

The training module is used for：

Three groups of networks are defined according to deep learning framework Caffe frameworks,

The loss function for defining three loss function layers is as follows successively：It is expected that every encoded mean value convergence 0.5, expected coded quantization Loss reduction, expected coding possess consistency to rotation scaling translation, first round training are then carried out, including the first networking Network training, second group of network training and the 3rd group of network training,

In first group of network training, it is trained for first-loss function layer and the second loss function layer,

In second group of network training, initialized using the weight file obtained after first group of network training, and pin Three loss function layers are trained,

In the 3rd group of network training, initialized using the weight file obtained after second group of network training, and will The input layer of 3rd group of network is revised as an only data input layer, and trains once to obtain the result mould of first round training Type,

The loss function of first loss function layer is revised as：It is expected that every encoded mean value convergence 0；And from three networking Deleted in the definition of network positioned at the nonlinear activation unit connected entirely between layer and loss function, then carry out described first group again Network training, second group of network training and the 3rd group of network training are to obtain results model of shaping, wherein first group of network instruction Experienced initialization weight is the weight obtained in first round training after the 3rd group of network training；

The receiving module is used for receiving plane view data；

The computing module is used to calculate the planar image data using the sizing results model, so as to be somebody's turn to do The compressed encoding of plane picture.
7. device according to claim 6, it is characterised in that

Also include modular converter, for being three two-dimentional signless integer matrixes by Three Channel Color converting of image file；Wherein The corresponding matrix of each passage, each pixel of each element of matrix respectively with image correspond, each element of matrix Value be passage corresponding to the matrix in pixel corresponding to the element pixel value；

The receiving module is additionally operable to receive described three two-dimentional signless integer matrixes as the planar image data.
8. device according to claim 6, it is characterised in that the loss letter of the every encoded mean value convergence 0.5 of expection Number expression formula is as follows：

<mrow> <munder> <mrow> <mi>m</mi> <mi>i</mi> <mi>n</mi> </mrow> <mi>W</mi> </munder> <mi>L</mi> <mrow> <mo>(</mo> <mi>W</mi> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mo>&Sigma;</mo> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>B</mi> </munderover> <mo>|</mo> <mo>|</mo> <mrow> <mo>(</mo> <msub> <mover> <mi>b</mi> <mo>&OverBar;</mo> </mover> <mi>k</mi> </msub> <mo>-</mo> <mn>0.5</mn> <mo>)</mo> </mrow> <mo>|</mo> <msup> <mo>|</mo> <mn>2</mn> </msup> </mrow>

Wherein, B is bit code string length,For cumulative mean of the characteristic value on kth position on all training datas, W is damage Function L hyper parameter is lost, represents network weight, L (W) represents the loss function using W as hyper parameter.
9. device according to claim 6, it is characterised in that the loss function table of the expected coded quantization loss reduction It is as follows up to formula：

<mrow> <munder> <mrow> <mi>m</mi> <mi>i</mi> <mi>n</mi> </mrow> <mi>W</mi> </munder> <mi>L</mi> <mrow> <mo>(</mo> <mi>W</mi> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mo>&Sigma;</mo> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>M</mi> </munderover> <mo>|</mo> <mo>|</mo> <mrow> <mo>(</mo> <msub> <mi>b</mi> <mi>k</mi> </msub> <mo>-</mo> <mn>0.5</mn> <mo>)</mo> </mrow> <mo>-</mo> <mi>F</mi> <mrow> <mo>(</mo> <msub> <mi>x</mi> <mi>k</mi> </msub> <mo>,</mo> <mi>W</mi> <mo>)</mo> </mrow> <mo>|</mo> <msup> <mo>|</mo> <mn>2</mn> </msup> </mrow>

Wherein b_k=0.5 × (sign (F (x；W))+1), the value of sign functions is -1 or 1, F function are the full articulamentum of last layer Non-linear projection function, according to the weight matrix of this layer and node location x_kCharacteristic value corresponding to this layer of position k is exported, wherein X represents to be worth corresponding to network concealed node layer k, and M represents hidden layer node total number.
10. device according to claim 6, it is characterised in that the expected coding possesses constant to rotation scaling translation The loss function expression formula of property is as follows：

<mrow> <munder> <mrow> <mi>min</mi> <mi> </mi> <mi>L</mi> <mrow> <mo>(</mo> <mi>W</mi> <mo>)</mo> </mrow> </mrow> <mi>W</mi> </munder> <mo>=</mo> <munderover> <mo>&Sigma;</mo> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>M</mi> </munderover> <munderover> <mo>&Sigma;</mo> <mi>i</mi> <mi>L</mi> </munderover> <mo>|</mo> <mo>|</mo> <msub> <mi>b</mi> <mrow> <mi>k</mi> <mo>,</mo> <mi>i</mi> </mrow> </msub> <mo>-</mo> <msub> <mi>b</mi> <mi>k</mi> </msub> <mo>|</mo> <mo>|</mo> </mrow>

Wherein L is every figure corresponding rotation and the total number of images of translation, and M is training image sum.b_k,iFor the figure for translating or rotating The characteristic value as corresponding to, b_kFor artwork characteristic value.
11. a kind of electronic equipment, it is characterised in that including：

One or more processors；

Storage device, for storing one or more programs,

When one or more of programs are by one or more of computing devices so that one or more of processors are real The now method as any one of claim 1-5.
12. a kind of computer-readable medium, is stored thereon with computer program, it is characterised in that described program is held by processor The method as any one of claim 1-5 is realized during row.