CN107845116A - The method and apparatus for generating the compressed encoding of plane picture - Google Patents
The method and apparatus for generating the compressed encoding of plane picture Download PDFInfo
- Publication number
- CN107845116A CN107845116A CN201710960042.3A CN201710960042A CN107845116A CN 107845116 A CN107845116 A CN 107845116A CN 201710960042 A CN201710960042 A CN 201710960042A CN 107845116 A CN107845116 A CN 107845116A
- Authority
- CN
- China
- Prior art keywords
- mrow
- group
- training
- network
- loss function
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T9/00—Image coding
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
The present invention provides a kind of method and apparatus for the compressed encoding for generating plane picture, helps to obtain the preferable image compression encoding of feature representation power, and improve treatment effeciency.The method of the compressed encoding of the generation plane picture of the present invention includes:Three groups of networks are defined according to deep learning framework Caffe frameworks;And define the loss function of three loss function layers;Then first round training and the second wheel training are carried out to obtain results model of shaping, wherein the initialization weight of first group of network training is the weight obtained in first round training after the 3rd group of network training;The planar image data of input is calculated using the sizing results model, so as to obtain the compressed encoding of the plane picture.
Description
Technical field
The present invention relates to box counting algorithm technical field, a kind of particularly compressed encoding for generating plane picture
Method and apparatus.
Background technology
Characteristics of image is the basis for analyzing image.For example cluster scene:Computer needs to put together in similar picture,
Dissimilar separates;Its basis for estimation is the characteristics of image that image content-based extracts, the spy between close picture
Sign characteristic distance between small, dissimilar image is remote.A kind of good characteristics of image on the one hand can be clear by distance
Define image repetition, phase Sihe dissmilarity;On the other hand storage overhead can also be saved and compared beneficial to efficient distance.
Image compression encoding belongs to one kind of characteristics of image.Existing a large amount of existing characteristics of image are floating number expression, and
The meaning of compressed encoding feature can significantly reduce characteristic storage cost and conveniently while being not significantly reduce feature representation power
Spy carries out sign distance and compared.
In the prior art, generating the scheme of the compressed encoding of plane picture mainly has based on Hamming distance LSH (Locality
Sensitive Hashing) scheme and scheme based on machine learning method, the latter refer mainly to the engineering of supervised learning
Learning method.Introduced individually below.
LSH is also known as local sensitivity Hash, and its basic thought is:Two consecutive number strong points in original data space are passed through
After identical mapping or projective transformation, the two data points probability still adjacent in new data space is very big, without phase
Adjacent data point is mapped to the probability very little of same bucket.Assuming that k bit positions compression binary-coding of anticipated output, Wo Mentong
Cross k hash function { H of design1 g, H2 g, H3 g...,, the wherein input of hash function is floating point features value, export as 0 or
1, by original K floating-point bit mapping to k bit position, it is common practice to randomly select k different positions, application from K floating-point position
Above-mentioned k hash function carries out projection dress and changed, and ultimately produces the bit strings of a k bit length.
LSH schemes can be carried out in three steps, and the first step is to set mapping, that is, set which feature dimensions in characteristic vector to need to breathe out
Uncommon Function Projective;Second step is to set hash function, is used to determine compressed encoding wherein fixing or being randomly provided hyperplane threshold value
It is worth for 0 or 1;3rd step is mapping, that is, treats mappings characteristics dimension and perform pressure of the hash function in the form of the two-value code for obtaining image
Reduce the staff code.
Scheme based on engineering method (supervised learning) can be divided into three steps, and the first step is to be trained data mark;
Second step is to perform training process, and study obtains hash function;3rd step is to be mapped, wherein by before network to computing,
Input picture is directly converted to obtain the compressed encoding of two-value code form.
For above-mentioned LSH schemes, because hash function is artificial fixation or random function, lack generalization ability, and
It belongs to the algorithm of relative study, and precision is poor.Above-mentioned supervised learning method needs artificial mark mass data, and
Its target is positioned at classification, relatively poor in field, its precision such as multigraph discriminatings.The bad direct result of precision is to cause image
The feature representation power of compressed encoding is inadequate, so as to be difficult to define the similarity between image.
The content of the invention
In view of this, the present invention provides a kind of method and apparatus for the compressed encoding for generating plane picture, helps to obtain
The preferable image compression encoding of feature representation power, and improve treatment effeciency.
To achieve the above object, according to an aspect of the invention, there is provided a kind of compressed encoding for generating plane picture
Method and apparatus.
The method of the compressed encoding of the generation plane picture of the present invention includes:Determine according to deep learning framework Caffe frameworks
Adopted three groups of networks;The loss function for defining three loss function layers is as follows successively:It is expected that every encoded mean value convergence 0.5, expection
Coded quantization loss reduction, expected coding possess consistency to rotation scaling translation, then carry out first round training, including
First group of network training, second group of network training and the 3rd group of network training;In first group of network training, for
First-loss function layer and the second loss function layer are trained;In second group of network training, first group of network is used
The weight file obtained after training is initialized, and is trained for three loss function layers;At described 3rd group
In network training, initialized using the weight file obtained after second group of network training, and by the input of the 3rd group of network
Layer is revised as an only data input layer, and trains once to obtain the results model of first round training;By first loss
The loss function of function layer is revised as:It is expected that every encoded mean value convergence 0;And delete position from the definition of three groups of networks
In connecting the nonlinear activation unit between layer and loss function entirely, then carry out again first group of network training, second group
Network training and the 3rd group of network training are to obtain results model of shaping, wherein the initialization weight of first group of network training
It is the weight obtained in first round training after the 3rd group of network training;Plane picture using the sizing results model to input
Data are calculated, so as to obtain the compressed encoding of the plane picture.
Alternatively, before the step of being calculated using the sizing results model the planar image data of input, also
Including:It is three two-dimentional signless integer matrixes by Three Channel Color converting of image file;Wherein each corresponding square of passage
Battle array, each element of matrix correspond with each pixel of image respectively, and the value of each element of matrix is corresponding to the matrix
The pixel value of pixel corresponding to the element in passage;Described three two-dimentional signless integer matrixes are defeated as planar image data
Enter the sizing results model.
Alternatively, the loss function expression formula of the every encoded mean value convergence 0.5 of the expection is as follows:
Wherein, B is bit code string length,For cumulative mean of the characteristic value on kth position on all training datas, W
For loss function L hyper parameter, network weight is represented, L (W) represents the loss function using W as hyper parameter.
Alternatively, the loss function expression formula of the expected coded quantization loss reduction is as follows:
Wherein bk=0.5 × (sign (F (x;W))+1), the value of sign functions is -1 or 1, F function are that last layer connects entirely
The non-linear projection function of layer is connect, according to the weight matrix of this layer and node location xkCharacteristic value corresponding to this layer of position k is exported,
Wherein x represents to be worth corresponding to network concealed node layer k, and M represents hidden layer node total number.
Alternatively, it is as follows to encode the loss function expression formula for possessing consistency to rotation scaling translation for the expection:
Wherein L is every figure corresponding rotation and the total number of images of translation, and M is training image sum.bk,iTo translate or rotating
Image corresponding to characteristic value, bkFor artwork characteristic value.
A kind of according to another aspect of the invention, it is proposed that device for the compressed encoding for generating plane picture.
The device of the compressed encoding of the generation plane picture of the present invention includes training module, receiving module and calculates mould
Block, wherein:The training module is used for:Three groups of networks are defined according to deep learning framework Caffe frameworks, define three loss letters
Several layers of loss function is as follows successively:It is expected that every encoded mean value convergence 0.5, expected coded quantization loss reduction, expected coding
Possess consistency to rotation scaling translation etc., first round training is then carried out, including first group of network training, the second networking
Network training and the 3rd group of network training, in first group of network training, lost for first-loss function layer and second
Function layer is trained, and initialization weight file is provided by Caffe frameworks, in second group of network training, uses first
The weight file obtained after group network training is initialized, and is trained for three loss function layers, described
In 3rd group of network training, initialized using the weight file obtained after second group of network training, and by the 3rd group of network
Input layer be revised as an only data input layer, and train once to obtain the results model of first round training, by first
The loss function of individual loss function layer is revised as:It is expected that every encoded mean value convergence 0;And from the definition of three groups of networks
Delete positioned at nonlinear activation unit between layer and loss function is connected entirely, then carry out again first group of network training,
Second group of network training and the 3rd group of network training are to obtain results model of shaping, wherein first group of network training is initial
It is the weight obtained in first round training after the 3rd group of network training to change weight;The receiving module is used for receiving plane picture number
According to;The computing module is used to calculate the planar image data using the sizing results model, so as to be somebody's turn to do
The compressed encoding of plane picture.
Alternatively, in addition to modular converter, for being that three two dimensions are whole without symbol by Three Channel Color converting of image file
Matrix number;The wherein corresponding matrix of each passage, each pixel of each element of matrix respectively with image correspond, square
The value of each element of battle array is the pixel value of pixel corresponding to the element in passage corresponding to the matrix;The receiving module is additionally operable to
Described three two-dimentional signless integer matrixes are received as the planar image data.
According to another aspect of the invention, there is provided a kind of electronic equipment, including:One or more processors;Storage dress
Put, for storing one or more programs, when one or more of programs are by one or more of computing devices so that
One or more of processors realize method of the present invention.
According to another aspect of the invention, there is provided a kind of computer-readable medium, be stored thereon with computer program, institute
State and method of the present invention is realized when program is executed by processor.
Technique according to the invention scheme, propose specific two-wheeled training method and often take turns the certain content of training, produce
Model is end to end gone out, the characteristics of image compression encoding calculated with the model can reflect image better.In addition
Technical scheme marks without data, and section has lacked mark people resources costs and improved treatment effeciency;It is because caused
It is model end to end, so also having higher processing speed when calculating characteristics of image code.
Brief description of the drawings
Accompanying drawing is used to more fully understand the present invention, does not form inappropriate limitation of the present invention.Wherein:
Fig. 1 is that the embodiment of the present invention can apply to exemplary system architecture figure therein;
Fig. 2 is the schematic diagram of the basic step of generation network model according to embodiments of the present invention;
Fig. 3 is the schematic diagram of three groups of networks of definition according to embodiments of the present invention;
Fig. 4 is the signal of the basic structure of the device of the compressed encoding of generation plane picture according to embodiments of the present invention
Figure;
Fig. 5 is adapted for the structural representation for realizing the terminal device of the embodiment of the present invention or the computer system of server
Figure.
Embodiment
The one exemplary embodiment of the present invention is explained below in conjunction with accompanying drawing, including the various of the embodiment of the present invention
Details should think them only exemplary to help understanding.Therefore, those of ordinary skill in the art should recognize
Arrive, various changes and modifications can be made to the embodiments described herein, without departing from scope and spirit of the present invention.Together
Sample, for clarity and conciseness, the description to known function and structure is eliminated in following description.
Fig. 1 shows the method or generation plane of the compressed encoding for the generation plane picture that can apply the embodiment of the present invention
The exemplary system architecture 100 of the device of the compressed encoding of image.
As shown in figure 1, system architecture 100 can include terminal device 101,102,103, network 104 and server 105.
Network 104 between terminal device 101,102,103 and server 105 provide communication link medium.Network 104 can be with
Including various connection types, such as wired, wireless communication link or fiber optic cables etc..
User can be interacted with using terminal equipment 101,102,103 by network 104 with server 105, to receive or send out
Send message etc..Various telecommunication customer end applications, such as the application of shopping class, net can be installed on terminal device 101,102,103
(merely illustrative) such as the application of page browsing device, searching class application, JICQ, mailbox client, social platform softwares.
Terminal device 101,102,103 can have a display screen and a various electronic equipments that supported web page browses, bag
Include but be not limited to smart mobile phone, tablet personal computer, pocket computer on knee and desktop computer etc..
Server 105 can be to provide the server of various services, such as utilize terminal device 101,102,103 to user
The shopping class website browsed provides the back-stage management server (merely illustrative) supported.Back-stage management server can be to receiving
To the data such as information query request analyze etc. processing, and by result (such as target push information, product letter
Breath -- merely illustrative) feed back to terminal device.
It should be understood that the number of the terminal device, network and server in Fig. 1 is only schematical.According to realizing need
Will, can have any number of terminal device, network and server.
Fig. 2 is the schematic diagram of the basic step of generation network model according to embodiments of the present invention.The model is used for calculating
The compressed encoding of plane picture, the definition file including deep learning network and the network weight obtained after training are literary in terms of content
Part.As shown in Fig. 2 this method mainly includes the step of initialization and two-wheeled training, illustrate in detail below.
Step S21:Original net network layers define.The step of this step is initialization, specifically includes:Basic network is depth
Practise the network of three groups of net definitions document definitions in framework CaffetNet;And to this three groups of net definitions, three loss function
The loss function of layer is as follows successively:It is expected that every encoded mean value convergence 0.5, phase coded quantization loss reduction, it is contemplated that coding is to rotation
Turn scaling translation etc. and possess consistency.
Step S22:Carry out first round training.Epicycle training includes first to the 3rd group of network training.Every group of network training
Mode presses stochastic gradient descent mode, and in back-propagating process learning, initialization learning rate is 0.001, is iterated to 10000 times
Learning rate drops to 0.0001 afterwards.In first group of network training, enter for first-loss function layer and the second loss function layer
Row training, weight is initialized from existing (ImageNet) open training pattern loading;In second group of network training, first is used
The weight file obtained after group network training is initialized, and is trained for above three loss function layer;The 3rd
In group network training, initialized using the weight file obtained after second group of network training, and by the defeated of the 3rd group of network
Enter layer and be revised as an only data input layer, and train once to obtain the results model of first round training.
Step S23:Carry out the second wheel training.In epicycle training, first the loss function of first loss function layer is repaiied
It is changed to:It is expected that every encoded mean value convergence 0;And delete to be located at from the definition of above-mentioned three groups of networks and connect layer and loss function entirely
Between nonlinear activation unit.Then above-mentioned first group of network training, second group of network training and the 3rd group are carried out again
Network training is to obtain results model of shaping, wherein the initialization weight of first group of network training is the 3rd group in first round training
The weight obtained after network training.
The planar image data of input is calculated using the sizing results model obtained after step S23, it is possible to
Obtain the compressed encoding of the plane picture.The above method is described further below in conjunction with accompanying drawing.
In the embodiment of the present invention, based on network based on CaffeNet disclosed in internet, three groups of net definitions are defined
File (numbering N1, N2, N3).It is last that the corresponding tertiary target function of three layers of loss layer (L1, L2, L3) is added to network.This hair
Bright embodiment proposition only trains this three layers, and other layers learning rate in training is 0.
Network input layer is JPG or PNG format RGB Three Channel Color image files, is read in every in RGB triple channels after network
The data of individual passage are correspondingly read into a two-dimentional signless integer matrix, and (each integer representation scope is [0-255], wherein 0
Represent black, 255 be white) in, matrix line number is the height of image, and columns is the width of image, such as the threeway of 300 × 300 pixels
Road coloured image can finally be read into network three to be had in the signless integer matrix of the row of 300 row 300 respectively, the mesh of training
Be that higher-dimension of the image from 3 × 300 × 300 is represented that the compressed encoding for being reduced to a kind of low-dimensional represents, such as 1024 dimension floating numbers
And (floating point features for the network output being previously mentioned by the present embodiment has floating point values convergence two by a fixed average
The characteristics of end i.e. 1 and -1) 1024 dimension 0-1 bit sequences can be translated into, that is, binary-coding is compressed, but using the present embodiment
Method can be not limited to only produce the compressed encoding of 1024 dimensions.
Fig. 3 is the schematic diagram of three groups of networks of definition according to embodiments of the present invention.Wherein N1, N2, N3 represent definition
Three groups of networks, L1, L2, L3 represent loss function layer.
In the first round trains, in the first phase, first group of network (N1) is carried out for loss function layer (L1 and L2)
Training (such as about 570,000 can be used to be trained without mark picture, with 4 video cards, batch is (used in i.e. each iteration
Image number) size 64, iteration 50,000 times, so average every figure can be trained to more than 22 times, 4 × 64 × 5/57=22), net
Network initialization weight can use the pre- caffemodel instructed on the public data collection for carrying out comfortable ImageNet,
(caffemodel is based on the network weight file produced under Caffe frameworks).
In second stage, second group of network is initialized with the network weight weight values in the weight file of first group of network training
(N2), and for three loss function layers (L1, L2 and L3) it is trained simultaneously.It should be noted that because second group of network
(N2) paired image is needed to be trained when training loss function L3 layers, such as in picture material, when object has mobile,
Image before movement and it is mobile after characteristics of image (the compressed encoding feature arrived through final network abstraction) keep constant, now into
To image be it is mobile before image and it is mobile after image, and the input of the network of final molding only receives a figure every time
As to input, this is also the reason for needs in the phase III with the 3rd group of network (N3):The power trained with second group of network
Weight file initialization, while an input layer i.e. only data input layer is changed in network structure, final network structure is shaped.
Therefore the 3rd group does not need additional iterations training, need to only perform once.
The loss function of three groups of training of the first round is as follows:
L1:It is expected that every encoded mean value convergence 0.5;
B is bit (bit code) string length,For cumulative mean of the characteristic value on kth position on all training datas, W
For loss function L hyper parameter, network weight is represented, L (W) represents the loss function using W as hyper parameter.
L2:It is expected that coded quantization loss reduction;
Wherein bk=0.5 × (sign (F (x;W))+1), the value of sign functions is -1 or 1, F function are that last layer connects entirely
The non-linear projection function of layer is connect, according to the weight matrix (training learns) and node location x of this layerkExport this layer of k pairs of position
The characteristic value answered, wherein x represent to be worth corresponding to network concealed layer (the full articulamentum where extraction feature) node k, and M represents to hide
Layer (the full articulamentum where extraction feature) node total number.
L3:It is expected that coding is to rotating, scaling, the operation such as translating and possess consistency.
Wherein L is every figure corresponding rotation and the total number of images of translation, and M is training image sum.bk,iTo translate or rotating
Image corresponding to characteristic value, bkFor artwork characteristic value.
The embodiment of the present invention is proposed by the way of the two-wheeled training with specific training content.Done in the second wheel training
Some adjustment, specifically loss function L1 is adjusted to be expected every encoded mean value convergence 0 (being 0.5 in first round training), and
And remove the ReLU layers between full articulamentum and loss function (i.e. nonlinear activation unit) from net definitions.Second wheel
The flow of training and first round training are identical, are equally the training for carrying out for above-mentioned first to phase III, only the first stage
Initialization weight be the first round training phase III output weight.
The second sizing results model that obtains afterwards of wheel training is model end to end, to the mode input plane picture just
The compression binary-coding of the image can be directly obtained.Below again to the compressed encoding for generating plane picture of the embodiment of the present invention
Device be illustrated.Fig. 4 is the basic structure of the device of the compressed encoding of generation plane picture according to embodiments of the present invention
Schematic diagram.As shown in figure 4, the device 40 of the compressed encoding of generation plane picture includes training module, receiving module, Yi Jiji
Calculate module.The method that training module is used for as described above obtains above-mentioned sizing results model.Receiving module is used to receive
Planar image data;Computing module is used to calculate the planar image data using above-mentioned sizing results model, so as to
To the compressed encoding of the plane picture.
Above-mentioned device 40 may also include modular converter (be shown in figure), for by Three Channel Color converting of image file
For three two-dimentional signless integer matrixes;The wherein corresponding matrix of each passage, each element of matrix respectively with image
Each pixel corresponds, and the value of each element of matrix is the pixel of pixel corresponding to the element in passage corresponding to the matrix
Value.So, above-mentioned receiving module is additionally operable to receive above three two dimension signless integer matrix as above-mentioned plane picture number
According to.
Below with reference to Fig. 5, it illustrates suitable for for realizing the computer system 500 of the terminal device of the embodiment of the present invention
Structural representation.Terminal device shown in Fig. 5 is only an example, to the function of the embodiment of the present invention and should not use model
Shroud carrys out any restrictions.
As shown in figure 5, computer system 500 includes CPU (CPU) 501, it can be read-only according to being stored in
Program in memory (ROM) 502 or be loaded into program in random access storage device (RAM) 503 from storage part 508 and
Perform various appropriate actions and processing.In RAM 503, also it is stored with system 500 and operates required various programs and data.
CPU 501, ROM 502 and RAM 503 are connected with each other by bus 504.Input/output (I/O) interface 505 is also connected to always
Line 504.
I/O interfaces 505 are connected to lower component:Importation 506 including keyboard, mouse etc.;Penetrated including such as negative electrode
The output par, c 507 of spool (CRT), liquid crystal display (LCD) etc. and loudspeaker etc.;Storage part 508 including hard disk etc.;
And the communications portion 509 of the NIC including LAN card, modem etc..Communications portion 509 via such as because
The network of spy's net performs communication process.Driver 510 is also according to needing to be connected to I/O interfaces 505.Detachable media 511, such as
Disk, CD, magneto-optic disk, semiconductor memory etc., it is arranged on as needed on driver 510, in order to read from it
Computer program be mounted into as needed storage part 508.
Especially, according to embodiment disclosed by the invention, may be implemented as counting above with reference to the process of flow chart description
Calculation machine software program.For example, embodiment disclosed by the invention includes a kind of computer program product, it includes being carried on computer
Computer program on computer-readable recording medium, the computer program include the program code for being used for the method shown in execution flow chart.
In such embodiment, the computer program can be downloaded and installed by communications portion 509 from network, and/or from can
Medium 511 is dismantled to be mounted.When the computer program is performed by CPU (CPU) 501, system of the invention is performed
The above-mentioned function of middle restriction.
It should be noted that the computer-readable medium shown in the present invention can be computer-readable signal media or meter
Calculation machine readable storage medium storing program for executing either the two any combination.Computer-readable recording medium for example can be --- but not
Be limited to --- electricity, magnetic, optical, electromagnetic, system, device or the device of infrared ray or semiconductor, or it is any more than combination.Meter
The more specifically example of calculation machine readable storage medium storing program for executing can include but is not limited to:Electrical connection with one or more wires, just
Take formula computer disk, hard disk, random access storage device (RAM), read-only storage (ROM), erasable type and may be programmed read-only storage
Device (EPROM or flash memory), optical fiber, portable compact disc read-only storage (CD-ROM), light storage device, magnetic memory device,
Or above-mentioned any appropriate combination.In the present invention, computer-readable recording medium can any include or store journey
The tangible medium of sequence, the program can be commanded the either device use or in connection of execution system, device.And at this
In invention, computer-readable signal media can include in a base band or as carrier wave a part propagation data-signal,
Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including but unlimited
In electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be that computer can
Any computer-readable medium beyond storage medium is read, the computer-readable medium, which can send, propagates or transmit, to be used for
By instruction execution system, device either device use or program in connection.Included on computer-readable medium
Program code can be transmitted with any appropriate medium, be included but is not limited to:Wirelessly, electric wire, optical cable, RF etc., or it is above-mentioned
Any appropriate combination.
Flow chart and block diagram in accompanying drawing, it is illustrated that according to the system of various embodiments of the invention, method and computer journey
Architectural framework in the cards, function and the operation of sequence product.At this point, each square frame in flow chart or block diagram can generation
The part of one module of table, program segment or code, a part for above-mentioned module, program segment or code include one or more
For realizing the executable instruction of defined logic function.It should also be noted that some as replace realization in, institute in square frame
The function of mark can also be with different from the order marked in accompanying drawing generation.For example, two square frames succeedingly represented are actual
On can perform substantially in parallel, they can also be performed in the opposite order sometimes, and this is depending on involved function.Also
It is noted that the combination of each square frame and block diagram in block diagram or flow chart or the square frame in flow chart, can use and perform rule
Fixed function or the special hardware based system of operation are realized, or can use the group of specialized hardware and computer instruction
Close to realize.
Being described in module involved in the embodiment of the present invention can be realized by way of software, can also be by hard
The mode of part is realized.Described module can also be set within a processor, for example, can be described as:A kind of processor bag
Include training module, receiving module and computing module.Wherein, the title of these modules is not formed to this under certain conditions
The restriction of module in itself, for example, receiving module is also described as " module of receiving plane view data ".
As on the other hand, present invention also offers a kind of computer-readable medium, the computer-readable medium can be
Included in equipment described in above-described embodiment;Can also be individualism, and without be incorporated the equipment in.Above-mentioned calculating
Machine computer-readable recording medium carries one or more program, when said one or multiple programs are performed by the equipment, makes
Obtain the equipment and be able to carry out method described in the embodiment of the present invention, such as according to the method performed by Fig. 1.
According to embodiments of the present invention, propose specific two-wheeled training method and often take turns the certain content of training, output
Model end to end.According to embodiments of the present invention, caused binary-coding repeats figure test sets (containing 1443 totally 664 570,000
Group mark ground-truth) under same precision error mistake recall it is more excellent.Weighing apparatus is recalled by the mistake under same precision
Measure the performance in test set:In bit length=1024, and it is 74 to call number together more than mistake under 90% precision, and prior art is herein
Under the conditions of mistake call together number typically more than 3000.That is image compression encoding caused by the embodiment of the present invention can be relatively good
Ground reflects the characteristics of image.The computational methods of above-mentioned precision are:After image is clustered as similarity degree contained by each packet of test set
The average value (i.e. total group of number of each group ratio sum divided by test set) of the ratio of group member's number and the number of members that should contain.It is wrong
It is that non-test set member is total in each group after clustering to recall number by mistake.It can be seen that in addition from the description of the embodiment of the present invention
Marked without data, section has lacked mark people resources costs and improved treatment effeciency;It is model end to end because caused, institute
Also to have higher processing speed when calculating characteristics of image code.
Above-mentioned embodiment, does not form limiting the scope of the invention.Those skilled in the art should be bright
It is white, depending on design requirement and other factors, various modifications, combination, sub-portfolio and replacement can occur.It is any
Modifications, equivalent substitutions and improvements made within the spirit and principles in the present invention etc., should be included in the scope of the present invention
Within.
Claims (12)
- A kind of 1. method for the compressed encoding for generating plane picture, it is characterised in that including:Three groups of networks are defined according to deep learning framework Caffe frameworks;The loss function for defining three loss function layers is as follows successively:It is expected that every encoded mean value convergence 0.5, expected coded quantization Loss reduction, expected coding possess consistency to rotation scaling translation, first round training are then carried out, including the first networking Network training, second group of network training and the 3rd group of network training;In first group of network training, it is trained for first-loss function layer and the second loss function layer;In second group of network training, initialized using the weight file obtained after first group of network training, and pin Three loss function layers are trained;In the 3rd group of network training, initialized using the weight file obtained after second group of network training, and will The input layer of 3rd group of network is revised as an only data input layer, and trains once to obtain the result mould of first round training Type;The loss function of first loss function layer is revised as:It is expected that every encoded mean value convergence 0;And from three networking Deleted in the definition of network positioned at the nonlinear activation unit connected entirely between layer and loss function, then carry out described first group again Network training, second group of network training and the 3rd group of network training are to obtain results model of shaping, wherein first group of network instruction Experienced initialization weight is the weight obtained in first round training after the 3rd group of network training;The planar image data of input is calculated using the sizing results model, so as to obtain the compression of the plane picture Coding.
- 2. according to the method for claim 1, it is characterised in that the plane picture using the sizing results model to input Before the step of data are calculated, in addition to:It is three two-dimentional signless integer matrixes by Three Channel Color converting of image file;Wherein each corresponding square of passage Battle array, each element of matrix correspond with each pixel of image respectively, and the value of each element of matrix is corresponding to the matrix The pixel value of pixel corresponding to the element in passage;The sizing results model is inputted using described three two-dimentional signless integer matrixes as planar image data.
- 3. according to the method for claim 1, it is characterised in that the loss letter of the every encoded mean value convergence 0.5 of expection Number expression formula is as follows:<mrow> <munder> <mrow> <mi>m</mi> <mi>i</mi> <mi>n</mi> </mrow> <mi>W</mi> </munder> <mi>L</mi> <mrow> <mo>(</mo> <mi>W</mi> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mo>&Sigma;</mo> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>B</mi> </munderover> <mo>|</mo> <mo>|</mo> <mrow> <mo>(</mo> <msub> <mover> <mi>b</mi> <mo>&OverBar;</mo> </mover> <mi>k</mi> </msub> <mo>-</mo> <mn>0.5</mn> <mo>)</mo> </mrow> <mo>|</mo> <msup> <mo>|</mo> <mn>2</mn> </msup> </mrow>Wherein, B is bit code string length,For cumulative mean of the characteristic value on kth position on all training datas, W is damage Function L hyper parameter is lost, represents network weight, L (W) represents the loss function using W as hyper parameter.
- 4. according to the method for claim 1, it is characterised in that the loss function table of the expected coded quantization loss reduction It is as follows up to formula:<mrow> <munder> <mrow> <mi>m</mi> <mi>i</mi> <mi>n</mi> </mrow> <mi>W</mi> </munder> <mi>L</mi> <mrow> <mo>(</mo> <mi>W</mi> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mo>&Sigma;</mo> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>M</mi> </munderover> <mo>|</mo> <mo>|</mo> <mrow> <mo>(</mo> <msub> <mi>b</mi> <mi>k</mi> </msub> <mo>-</mo> <mn>0.5</mn> <mo>)</mo> </mrow> <mo>-</mo> <mi>F</mi> <mrow> <mo>(</mo> <msub> <mi>x</mi> <mi>k</mi> </msub> <mo>,</mo> <mi>W</mi> <mo>)</mo> </mrow> <mo>|</mo> <msup> <mo>|</mo> <mn>2</mn> </msup> </mrow>Wherein bk=0.5 × (sign (F (x;W))+1), the value of sign functions is -1 or 1, F function are the full articulamentum of last layer Non-linear projection function, according to the weight matrix of this layer and node location xkCharacteristic value corresponding to this layer of position k is exported, wherein X represents to be worth corresponding to network concealed node layer k, and M represents hidden layer node total number.
- 5. according to the method for claim 1, it is characterised in that the expected coding possesses consistency to rotation scaling translation Loss function expression formula it is as follows:<mrow> <munder> <mrow> <mi>min</mi> <mi> </mi> <mi>L</mi> <mrow> <mo>(</mo> <mi>W</mi> <mo>)</mo> </mrow> </mrow> <mi>W</mi> </munder> <mo>=</mo> <munderover> <mo>&Sigma;</mo> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>M</mi> </munderover> <munderover> <mo>&Sigma;</mo> <mi>i</mi> <mi>L</mi> </munderover> <mo>|</mo> <mo>|</mo> <msub> <mi>b</mi> <mrow> <mi>k</mi> <mo>,</mo> <mi>i</mi> </mrow> </msub> <mo>-</mo> <msub> <mi>b</mi> <mi>k</mi> </msub> <mo>|</mo> <mo>|</mo> </mrow>Wherein L is every figure corresponding rotation and the total number of images of translation, and M is training image sum.bk,iFor the figure for translating or rotating The characteristic value as corresponding to, bkFor artwork characteristic value.
- 6. it is a kind of generate plane picture compressed encoding device, it is characterised in that including training module, receiving module and Computing module, wherein:The training module is used for:Three groups of networks are defined according to deep learning framework Caffe frameworks,The loss function for defining three loss function layers is as follows successively:It is expected that every encoded mean value convergence 0.5, expected coded quantization Loss reduction, expected coding possess consistency to rotation scaling translation, first round training are then carried out, including the first networking Network training, second group of network training and the 3rd group of network training,In first group of network training, it is trained for first-loss function layer and the second loss function layer,In second group of network training, initialized using the weight file obtained after first group of network training, and pin Three loss function layers are trained,In the 3rd group of network training, initialized using the weight file obtained after second group of network training, and will The input layer of 3rd group of network is revised as an only data input layer, and trains once to obtain the result mould of first round training Type,The loss function of first loss function layer is revised as:It is expected that every encoded mean value convergence 0;And from three networking Deleted in the definition of network positioned at the nonlinear activation unit connected entirely between layer and loss function, then carry out described first group again Network training, second group of network training and the 3rd group of network training are to obtain results model of shaping, wherein first group of network instruction Experienced initialization weight is the weight obtained in first round training after the 3rd group of network training;The receiving module is used for receiving plane view data;The computing module is used to calculate the planar image data using the sizing results model, so as to be somebody's turn to do The compressed encoding of plane picture.
- 7. device according to claim 6, it is characterised in thatAlso include modular converter, for being three two-dimentional signless integer matrixes by Three Channel Color converting of image file;Wherein The corresponding matrix of each passage, each pixel of each element of matrix respectively with image correspond, each element of matrix Value be passage corresponding to the matrix in pixel corresponding to the element pixel value;The receiving module is additionally operable to receive described three two-dimentional signless integer matrixes as the planar image data.
- 8. device according to claim 6, it is characterised in that the loss letter of the every encoded mean value convergence 0.5 of expection Number expression formula is as follows:<mrow> <munder> <mrow> <mi>m</mi> <mi>i</mi> <mi>n</mi> </mrow> <mi>W</mi> </munder> <mi>L</mi> <mrow> <mo>(</mo> <mi>W</mi> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mo>&Sigma;</mo> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>B</mi> </munderover> <mo>|</mo> <mo>|</mo> <mrow> <mo>(</mo> <msub> <mover> <mi>b</mi> <mo>&OverBar;</mo> </mover> <mi>k</mi> </msub> <mo>-</mo> <mn>0.5</mn> <mo>)</mo> </mrow> <mo>|</mo> <msup> <mo>|</mo> <mn>2</mn> </msup> </mrow>Wherein, B is bit code string length,For cumulative mean of the characteristic value on kth position on all training datas, W is damage Function L hyper parameter is lost, represents network weight, L (W) represents the loss function using W as hyper parameter.
- 9. device according to claim 6, it is characterised in that the loss function table of the expected coded quantization loss reduction It is as follows up to formula:<mrow> <munder> <mrow> <mi>m</mi> <mi>i</mi> <mi>n</mi> </mrow> <mi>W</mi> </munder> <mi>L</mi> <mrow> <mo>(</mo> <mi>W</mi> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mo>&Sigma;</mo> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>M</mi> </munderover> <mo>|</mo> <mo>|</mo> <mrow> <mo>(</mo> <msub> <mi>b</mi> <mi>k</mi> </msub> <mo>-</mo> <mn>0.5</mn> <mo>)</mo> </mrow> <mo>-</mo> <mi>F</mi> <mrow> <mo>(</mo> <msub> <mi>x</mi> <mi>k</mi> </msub> <mo>,</mo> <mi>W</mi> <mo>)</mo> </mrow> <mo>|</mo> <msup> <mo>|</mo> <mn>2</mn> </msup> </mrow>Wherein bk=0.5 × (sign (F (x;W))+1), the value of sign functions is -1 or 1, F function are the full articulamentum of last layer Non-linear projection function, according to the weight matrix of this layer and node location xkCharacteristic value corresponding to this layer of position k is exported, wherein X represents to be worth corresponding to network concealed node layer k, and M represents hidden layer node total number.
- 10. device according to claim 6, it is characterised in that the expected coding possesses constant to rotation scaling translation The loss function expression formula of property is as follows:<mrow> <munder> <mrow> <mi>min</mi> <mi> </mi> <mi>L</mi> <mrow> <mo>(</mo> <mi>W</mi> <mo>)</mo> </mrow> </mrow> <mi>W</mi> </munder> <mo>=</mo> <munderover> <mo>&Sigma;</mo> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>M</mi> </munderover> <munderover> <mo>&Sigma;</mo> <mi>i</mi> <mi>L</mi> </munderover> <mo>|</mo> <mo>|</mo> <msub> <mi>b</mi> <mrow> <mi>k</mi> <mo>,</mo> <mi>i</mi> </mrow> </msub> <mo>-</mo> <msub> <mi>b</mi> <mi>k</mi> </msub> <mo>|</mo> <mo>|</mo> </mrow>Wherein L is every figure corresponding rotation and the total number of images of translation, and M is training image sum.bk,iFor the figure for translating or rotating The characteristic value as corresponding to, bkFor artwork characteristic value.
- 11. a kind of electronic equipment, it is characterised in that including:One or more processors;Storage device, for storing one or more programs,When one or more of programs are by one or more of computing devices so that one or more of processors are real The now method as any one of claim 1-5.
- 12. a kind of computer-readable medium, is stored thereon with computer program, it is characterised in that described program is held by processor The method as any one of claim 1-5 is realized during row.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710960042.3A CN107845116B (en) | 2017-10-16 | 2017-10-16 | Method and apparatus for generating compression encoding of flat image |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710960042.3A CN107845116B (en) | 2017-10-16 | 2017-10-16 | Method and apparatus for generating compression encoding of flat image |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107845116A true CN107845116A (en) | 2018-03-27 |
CN107845116B CN107845116B (en) | 2021-05-25 |
Family
ID=61662199
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710960042.3A Active CN107845116B (en) | 2017-10-16 | 2017-10-16 | Method and apparatus for generating compression encoding of flat image |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107845116B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113298689A (en) * | 2021-06-22 | 2021-08-24 | 河南师范大学 | Large-capacity image steganography method |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105069400A (en) * | 2015-07-16 | 2015-11-18 | 北京工业大学 | Face image gender recognition system based on stack type sparse self-coding |
CN105069173A (en) * | 2015-09-10 | 2015-11-18 | 天津中科智能识别产业技术研究院有限公司 | Rapid image retrieval method based on supervised topology keeping hash |
US20160358043A1 (en) * | 2015-06-05 | 2016-12-08 | At&T Intellectual Property I, L.P. | Hash codes for images |
CN106250812A (en) * | 2016-07-15 | 2016-12-21 | 汤平 | A kind of model recognizing method based on quick R CNN deep neural network |
CN106780512A (en) * | 2016-11-30 | 2017-05-31 | 厦门美图之家科技有限公司 | The method of segmentation figure picture, using and computing device |
CN106920243A (en) * | 2017-03-09 | 2017-07-04 | 桂林电子科技大学 | The ceramic material part method for sequence image segmentation of improved full convolutional neural networks |
CN107169573A (en) * | 2017-05-05 | 2017-09-15 | 第四范式(北京)技术有限公司 | Using composite machine learning model come the method and system of perform prediction |
CN107231566A (en) * | 2016-03-25 | 2017-10-03 | 阿里巴巴集团控股有限公司 | A kind of video transcoding method, device and system |
CN107239793A (en) * | 2017-05-17 | 2017-10-10 | 清华大学 | Many quantisation depth binary feature learning methods and device |
-
2017
- 2017-10-16 CN CN201710960042.3A patent/CN107845116B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160358043A1 (en) * | 2015-06-05 | 2016-12-08 | At&T Intellectual Property I, L.P. | Hash codes for images |
CN105069400A (en) * | 2015-07-16 | 2015-11-18 | 北京工业大学 | Face image gender recognition system based on stack type sparse self-coding |
CN105069173A (en) * | 2015-09-10 | 2015-11-18 | 天津中科智能识别产业技术研究院有限公司 | Rapid image retrieval method based on supervised topology keeping hash |
CN107231566A (en) * | 2016-03-25 | 2017-10-03 | 阿里巴巴集团控股有限公司 | A kind of video transcoding method, device and system |
CN106250812A (en) * | 2016-07-15 | 2016-12-21 | 汤平 | A kind of model recognizing method based on quick R CNN deep neural network |
CN106780512A (en) * | 2016-11-30 | 2017-05-31 | 厦门美图之家科技有限公司 | The method of segmentation figure picture, using and computing device |
CN106920243A (en) * | 2017-03-09 | 2017-07-04 | 桂林电子科技大学 | The ceramic material part method for sequence image segmentation of improved full convolutional neural networks |
CN107169573A (en) * | 2017-05-05 | 2017-09-15 | 第四范式(北京)技术有限公司 | Using composite machine learning model come the method and system of perform prediction |
CN107239793A (en) * | 2017-05-17 | 2017-10-10 | 清华大学 | Many quantisation depth binary feature learning methods and device |
Non-Patent Citations (2)
Title |
---|
VOLODYMYR TURCHENKO ET AL: ""A Deep Convolutional Auto-Encoder with Pooling - Unpooling Layers in Caffe"", 《ARXIV:1701.04949》 * |
张瑞茂 等: ""融合语义知识的深度表达学习及在视觉理解中的应用"", 《计算机研究与发展》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113298689A (en) * | 2021-06-22 | 2021-08-24 | 河南师范大学 | Large-capacity image steganography method |
CN113298689B (en) * | 2021-06-22 | 2023-04-18 | 河南师范大学 | Large-capacity image steganography method |
Also Published As
Publication number | Publication date |
---|---|
CN107845116B (en) | 2021-05-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021179720A1 (en) | Federated-learning-based user data classification method and apparatus, and device and medium | |
CN107633218A (en) | Method and apparatus for generating image | |
CN107526725A (en) | The method and apparatus for generating text based on artificial intelligence | |
CN105354248A (en) | Gray based distributed image bottom-layer feature identification method and system | |
CN109635946A (en) | A kind of combined depth neural network and the clustering method constrained in pairs | |
CN107766492A (en) | A kind of method and apparatus of picture search | |
CN109919202A (en) | Disaggregated model training method and device | |
CN111428727A (en) | Natural scene text recognition method based on sequence transformation correction and attention mechanism | |
CN110929806A (en) | Picture processing method and device based on artificial intelligence and electronic equipment | |
CN116797248A (en) | Data traceability management method and system based on block chain | |
Sarmah et al. | Optimization models in steganography using metaheuristics | |
CN110457325A (en) | Method and apparatus for output information | |
CN107845116A (en) | The method and apparatus for generating the compressed encoding of plane picture | |
CN113537416A (en) | Method and related equipment for converting text into image based on generative confrontation network | |
CN107402878A (en) | Method of testing and device | |
CN104376280B (en) | A kind of image code generation method towards Google glass | |
CN116306780A (en) | Dynamic graph link generation method | |
CN104376523A (en) | Method for constituting image code anti-fake system for Google project glass | |
CN104376314B (en) | A kind of constructive method towards Google glass Internet of Things web station system | |
CN115082800B (en) | Image segmentation method | |
WO2022105117A1 (en) | Method and device for image quality assessment, computer device, and storage medium | |
CN115620342A (en) | Cross-modal pedestrian re-identification method, system and computer | |
Li et al. | A new aesthetic QR code algorithm based on salient region detection and SPBVM | |
CN112966150A (en) | Video content extraction method and device, computer equipment and storage medium | |
CN109474826A (en) | Picture compression method, apparatus, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |