CN107330852A

CN107330852A - A kind of image processing method based on real-time zero point image manipulation network

Info

Publication number: CN107330852A
Application number: CN201710531090.0A
Authority: CN
Inventors: 夏春秋
Original assignee: Shenzhen Vision Technology Co Ltd
Current assignee: Shenzhen Vision Technology Co Ltd
Priority date: 2017-07-03
Filing date: 2017-07-03
Publication date: 2017-11-07

Abstract

A kind of image processing method based on real-time zero point image manipulation network proposed in the present invention, its main contents include：Image procossing and convolutional neural networks (CNN), zero point operation network, training and test, its process is, zero point operation network is made up of parameter network (PNet) and image switching network (TNet), PNet is the universal model for producing key transformation parameter hierarchical structure, and these parameters generated are combined by TNet with the signal Mutation parameter of its own, content images are converted into stylized image.The guiding signal that zero point operation network can be in different forms in the present invention carries out high-quality image procossing in real time, with receiving any ability for instructing signal in many ways；Meanwhile, it supports the various mobile devices of connection, and so that user can obtain required output immediately, the stand-by period greatly shortens；And the stylistic category for being available for user to select is more, can more meet the demand of user.

Description

A kind of image processing method based on real-time zero point image manipulation network

Technical field

Image processing field of the present invention, more particularly, to a kind of image procossing based on real-time zero point image manipulation network Method.

Background technology

Graphic Arts stylization is the main research of non-photorealistic rendering in computer graphics in recent years, it with Computer is instrument, and the form of expression of visual information in the drafting style of different art forms, enhancing image is gone out with algorithm simulation, This technology for effectively being combined computer technology and aesthetics is increasingly liked by user.Non-photorealistic rendering can There is provided for artist and create help, entertained for science and medical illustration field, engineering field and film, game, animation etc. Industry etc., can be also used for the picture editting of mobile phone etc., and any common photo of user is converted into van gogh's body, Mo Nai by moment The fine piece of writing of a variety of different artistic styles such as wind, impressionist.It can also predict the correlation of environment in advance by simulating varying environment Situation, this provides convenience to operations on the sea such as marine traffic control, fishings；Meanwhile, by realizing image from daytime to night Conversion, conversion from fine day to rainy day etc., people is predicted environment in advance, conveniently go on a journey or implement other work.However, existing Some methods can not handle multiple signals simultaneously or can not handle the signal from other mode, and efficiency is very low.

The present invention proposes a kind of image processing method based on real-time zero point image manipulation network, zero point operate network by Parameter network (PNet) and image switching network (TNet) composition, PNet is the Universal Die for producing key transformation parameter hierarchical structure Type, and these parameters generated are combined by TNet with the signal Mutation parameter of its own, and content images are converted into stylization Image.The guiding signal that zero point operation network can be in different forms in the present invention carries out high-quality image procossing in real time, has Have and receive any ability for instructing signal in many ways；Meanwhile, it supports the various mobile devices of connection, so that user can stand Required output is obtained, the stand-by period greatly shortens；And the stylistic category for being available for user to select is more, can more meet user Demand.

The content of the invention

It is the problem of for that can not handle multiple signals simultaneously or the signal from other mode can not be handled, of the invention Purpose is a kind of image processing method based on real-time zero point image manipulation network of offer, and zero point operates network by parameter network (PNet) constituted with image switching network (TNet), PNet is the universal model for producing key transformation parameter hierarchical structure, and These parameters generated are combined by TNet with the signal Mutation parameter of its own, and content images are converted into stylized image.

To solve the above problems, the present invention provides a kind of image processing method based on real-time zero point image manipulation network, Its main contents includes：

(1) image procossing and convolutional neural networks (CNN)；

(2) zero point operation network；

(3) train and test.

Wherein, described image procossing and convolutional neural networks (CNN), give content imagesAnd guiding Signal (for example, pattern diagram picture)Output transform imageSo that the Y and X in content_cIt is similar, While and X_sIt is similar；Therefore, effective expression no less important of study content and pattern, could so be carried out at rational image Reason.

Further, described convolutional neural networks (CNN), using constant depth CNN φ (), the feature in l layers is reflected PenetrateImage X content, and φ can be represented_l(X) form matrix is expressed as It is calculated as：

Above formula can represent the desired style images of image X；Only when each correspondence represents (that is, φ_lOr G (φ (X)_l (X) difference between) is this black norm of not Luo Beini one small, could assess including two images similar in perhaps style； It therefore, it can training feedforward image converting network Y=T (X_c), typically depth CNN, with loss function：

Wherein,It is the style loss for generating image Y,It is contents lost, λ_sAnd λ_cIt is hyper parameter；S is " sample The set of formula layer ", C is the set of " content layer ", Z_lIt is the sum of neuron in l layers；Train after converting network T (), give Fixed new content images X '_c, stylization image Y=T (X ' are generated in the case of without using network is lost_c)。

Wherein, described zero point operation network, zero point operates network integration image switching network (TNet) and parameter net Network (PNet), except carrying out transition diagram picture using TNet, goes back training parameter network (PNet), signal is condition to produce to guide The TNet of (such as style image) key parameter；Because guiding signal is embedded into the communal space by PNet study, so zero Point operation network can carry out zero point image procossing to unknown guiding signal.

Further, described TNet is normalized with dynamic instance, in order to realize zero point image procossing, in test process TNet network parameter is dynamically specified, so as to handle invisible signal；Based on guiding signal X_sOn PNet regulation Characteristic pattern, initialization mode is the wave filter for directly generating TNet；However, in practice, TNet each layer generally has More than 100000 parameters (for example, 128 × 128 × 3 × 3), and the characteristic pattern in PNet each layer generally has about 1000000 Individual entry (for example, 128 × 80 × 80)；Therefore, it is difficult to effectively by higher-dimension vector median filters into another.

Further, described dynamic instance normalization, using dynamic enhancing example normalization (each volume in TNet Performed after lamination), pass through the PNet scalings produced and shift parameters γ (X_s) and β (X_s)；Here, scaling and translocation factor γ (X_s) and β (X_s) it is considered as the key parameter in TNet each layer；In form, before example normalization, orderFor tensor, x_ijkI-th jk element is represented, wherein, i indicative character figures, j, k crosses over Spatial Dimension；Therefore, The output of dynamic instance normalization (DIN)For：

Wherein, μ_iIt is the average value in characteristic pattern i,It is corresponding variance；γ_i(X_s) it is by being similar to that PNet is generated β_i(X_s) C_iDimension vector γ (X_s) i-th of element；If γ_i(X_s)=1, β_i(X_s)=0, then DIN be changed into example normalizing Change；If γ_i(X_s)=γ_i,β_i(X_s)=β_i, then they are as the direct learning parameter unrelated with PNet；DIN will be changed into condition Example is normalized；In both cases, the model all loses the ability of zero point study, therefore can not be generalized to and see not The signal seen.

Further, described parameter network (PNet), in order to have with dynamic instance normalization driving TNet, PNet There is serial or parallel framework.

Further, described serial PNet, in serial PNet, can use the depth with the structure similar to TNet CNN is spent, γ is generated in l layers^(l)(X_s) and β^(l)(X_s)；In serial PNet, the γ of formula (3) on adjustment characteristic pattern^(l)(X_s) And β^(l)(X_s), it is expressed as l layers in PNet of ψ_l(X_s)；

Wherein, if input X_sImage, then ψ_l(X_s) be convolutional layer in TNet output；If inputting X_sIt is word insertion (vector), then ψ_l(X_s) be the layer being fully connected output；It is the parameter to be learnt.

Further, described parallel PNet, uses individually shallow network (the be fully connected or convolution) generation in l layer ψ_l(X_s), then used it for calculating γ according to formula (4) and (5)^(l)(X_s) and β^(l)(X_s)；With the ψ from higher level_l(X_s) raw Into higher level γ^(l)(X_s) and β^(l)(X_s) serial PNet it is different, be from X here_sTo γ^(l)(X_s) and β^(l)(X_s) follow it is shallow and Parallel structure, but this structure will limit PNet validity, and the TNet of generation quality is somewhat reduced, and therefore produce Raw image Y；Therefore, mostly using serial PNet.

Wherein, described training and test, zero point operation network can carry out terminal instruction with the monitor mode for losing network Practice；In test phase, content images X_cWith guiding signal X_sIt is separately input in TNet and PNet, the image Y after generation conversion.

Brief description of the drawings

Fig. 1 is a kind of system flow chart of the image processing method based on real-time zero point image manipulation network of the present invention.

Fig. 2 is a kind of depth convolutional Neural net of the image processing method based on real-time zero point image manipulation network of the present invention The loss function of network.

Fig. 3 is a kind of serial PNet of the image processing method based on real-time zero point image manipulation network of the present invention.

Fig. 4 is a kind of parallel PNet of the image processing method based on real-time zero point image manipulation network of the present invention.

Embodiment

It should be noted that in the case where not conflicting, the feature in embodiment and embodiment in the application can phase Mutually combine, the present invention is described in further detail with specific embodiment below in conjunction with the accompanying drawings.

Fig. 1 is a kind of system flow chart of the image processing method based on real-time zero point image manipulation network of the present invention.It is main To include image procossing and convolutional neural networks (CNN), zero point operation network, training and test.

Image procossing and convolutional neural networks (CNN), give content imagesWith guiding signal (for example, sample Formula image)Output transform imageSo that the Y and X in content_cIt is similar, while and X_sIt is similar； Therefore, effective expression no less important of study content and pattern, could so carry out rational image procossing.

Zero point operates network, combines image switching network (TNet) and parameter network (PNet), except being come using TNet Transition diagram picture, goes back training parameter network (PNet), and to produce to guide, signal is the TNet of condition (such as style image) pass Bond parameter；Because guiding signal is embedded into the communal space by PNet study, so zero point operation network can draw to unknown Lead signal and carry out zero point image procossing.

Wherein, TNet is normalized with dynamic instance, in order to realize zero point image procossing, is dynamically specified in test process TNet network parameter, so as to handle invisible signal；Based on guiding signal X_sOn PNet regulation characteristic pattern, just Beginning mode is the wave filter for directly generating TNet；However, in practice, TNet each layer generally has more than 100000 Parameter (for example, 128 × 128 × 3 × 3), and characteristic pattern in PNet each layer generally have about 1000000 entries (for example, 128×80×80)；Therefore, it is difficult to effectively by higher-dimension vector median filters into another.

Using dynamic enhancing example normalization (being performed after each convolutional layer in TNet), the contracting produced by PNet Put and shift parameters γ (X_s) and β (X_s)；Here, scaling and translocation factor γ (X_s) and β (X_s) be considered as in TNet each layer Key parameter；In form, before example normalization, orderFor tensor, x_ijkI-th jk element is represented, its In, i indicative character figures, j, k crosses over Spatial Dimension；Therefore, the output of dynamic instance normalization (DIN)For：

Parameter network (PNet), in order to have serial or parallel frame with dynamic instance normalization driving TNet, PNet Structure.

Training and test, zero point operation network can carry out terminal training with the monitor mode for losing network；In test rank Section, content images X_cWith guiding signal X_sIt is separately input in TNet and PNet, the image Y after generation conversion.

Fig. 2 is a kind of depth convolutional Neural net of the image processing method based on real-time zero point image manipulation network of the present invention The loss function of network.Using constant depth CNN φ (), the Feature Mapping in l layersImage can be represented X content, and φ_l(X) form matrix is expressed asIt is calculated as：

Fig. 3 is a kind of serial PNet of the image processing method based on real-time zero point image manipulation network of the present invention.In string In row PNet, the depth CNN with the structure similar to TNet can be used, γ is generated in l layers^(l)(X_s) and β^(l)(X_s)； In serial PNet, the γ of formula (3) on adjustment characteristic pattern^(l)(X_s) and β^(l)(X_s), it is expressed as l layers in PNet of ψ_l(X_s)；

Fig. 4 is a kind of parallel PNet of the image processing method based on real-time zero point image manipulation network of the present invention.Use Individually shallow network (be fully connected or convolution) generates ψ in l layers_l(X_s), meter is then used it for according to formula (4) and (5) Calculate r^(l)(X_s) and β^(l)(X_s)；With the ψ from higher level_l(X_s) generation higher level γ^(l)(X_s) and β^(l)(X_s) serial PNet Difference, is from X here_sTo γ^(l)(X_s) and β^(l)(X_s) shallow and parallel structure is followed, but this structure will limit having for PNet Effect property, and the TNet of generation quality is somewhat reduced, and therefore produce image Y；Therefore, mostly using serial PNet.

For those skilled in the art, the present invention is not restricted to the details of above-described embodiment, in the essence without departing substantially from the present invention In the case of refreshing and scope, the present invention can be realized with other concrete forms.In addition, those skilled in the art can be to this hair Bright to carry out various changes and modification without departing from the spirit and scope of the present invention, these improvement and modification also should be regarded as the present invention's Protection domain.Therefore, appended claims are intended to be construed to include preferred embodiment and fall into all changes of the scope of the invention More and modification.

Claims

1. a kind of image processing method based on real-time zero point image manipulation network, it is characterised in that mainly including image procossing With convolutional neural networks (CNN) (one)；Zero point operation network (two)；Training and test (three).

2. based on the image procossing described in claims 1 and convolutional neural networks (CNN), it is characterised in that given content graph PictureWith guiding signal (for example, pattern diagram picture)Output transform imageMake Obtain Y and the X in content_cIt is similar, while and X_sIt is similar；Therefore, effective expression no less important of study content and pattern, so Rational image procossing can be carried out.

3. based on the convolutional neural networks (CNN) described in claims 1, it is characterised in that use constant depth CNN φ (), the Feature Mapping in l layersImage X content, and φ can be represented_l(X) form matrix table It is shown asIt is calculated as：

Above formula can represent the desired style images of image X；Only when each correspondence represents (that is, φ_lOr G (φ (X)_l(X)) it Between difference be this black norm of one small not Luo Beini, could assess including two images similar in perhaps style；Therefore, may be used To train feedforward image converting network Y=T (X_c), typically depth CNN, with loss function：

Wherein,It is the style loss for generating image Y,It is contents lost, λ_sAnd λ_cIt is hyper parameter；S is " pattern layer " Set, C is the set of " content layer ", Z_lIt is the sum of neuron in l layers；Train after converting network T (), give newly Content images X '_c, stylization image Y=T (X ' are generated in the case of without using network is lost_c)。

4. based on the zero point operation network (two) described in claims 1, it is characterised in that zero point operates network integration image Switching network (TNet) and parameter network (PNet), except carrying out transition diagram picture using TNet, go back training parameter network (PNet), with Produce to guide signal as the TNet of condition (such as style image) key parameter；Because PNet study will guiding signal insertion Into the communal space, so zero point operation network can carry out zero point image procossing to unknown guiding signal.

5. normalized based on the TNet described in claims 4 and dynamic instance, it is characterised in that in order to realize at zero point image Reason, dynamically specifies TNet network parameter, so as to handle invisible signal in test process；Based on guiding signal X_sOn PNet regulation characteristic pattern, initialization mode is the wave filter for directly generating TNet；However, in practice, TNet's Each layer generally has more than 100000 parameters (for example, 128 × 128 × 3 × 3), and the characteristic pattern in PNet each layer leads to Often there are about 1000000 entries (for example, 128 × 80 × 80)；Therefore, it is difficult to effectively by higher-dimension vector median filters into another.

6. based on the dynamic instance normalization described in claims 5, it is characterised in that using dynamic enhancing example normalization (being performed after each convolutional layer in TNet), passes through the PNet scalings produced and shift parameters γ (X_s) and β (X_s)；Here, Scaling and translocation factor γ (X_s) and β (X_s) it is considered as the key parameter in TNet each layer；In form, in example normalization Before, makeFor tensor, x_ijkI-th jk element is represented, wherein, i indicative character figures, j, k crosses over space dimension Degree；Therefore, the output of dynamic instance normalization (DIN)For：

<mrow> <mtable> <mtr> <mtd> <mrow> <msub> <mi>y</mi> <mrow> <mi>i</mi> <mi>j</mi> <mi>k</mi> </mrow> </msub> <mo>=</mo> <mfrac> <mrow> <msub> <mi>x</mi> <mrow> <mi>i</mi> <mi>j</mi> <mi>k</mi> </mrow> </msub> <mo>-</mo> <msub> <mi>&mu;</mi> <mi>i</mi> </msub> </mrow> <msqrt> <mrow> <msubsup> <mi>&sigma;</mi> <mi>i</mi> <mn>2</mn> </msubsup> <mo>+</mo> <mo>&Element;</mo> </mrow> </msqrt> </mfrac> <msub> <mi>&gamma;</mi> <mi>i</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>X</mi> <mi>s</mi> </msub> <mo>)</mo> </mrow> <mo>+</mo> <msub> <mi>&beta;</mi> <mi>i</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>X</mi> <mi>s</mi> </msub> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <msub> <mi>&mu;</mi> <mi>i</mi> </msub> <mo>=</mo> <mfrac> <mn>1</mn> <mrow> <mi>H</mi> <mi>W</mi> </mrow> </mfrac> <munderover> <mo>&Sigma;</mo> <mrow> <mi>j</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>H</mi> </munderover> <munderover> <mo>&Sigma;</mo> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>W</mi> </munderover> <msub> <mi>x</mi> <mrow> <mi>i</mi> <mi>j</mi> <mi>k</mi> </mrow> </msub> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <msubsup> <mi>&sigma;</mi> <mi>i</mi> <mn>1</mn> </msubsup> <mo>=</mo> <mfrac> <mn>1</mn> <mrow> <mi>H</mi> <mi>W</mi> </mrow> </mfrac> <munderover> <mo>&Sigma;</mo> <mrow> <mi>j</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>H</mi> </munderover> <munderover> <mo>&Sigma;</mo> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>W</mi> </munderover> <mrow> <mo>(</mo> <msub> <mi>x</mi> <mrow> <mi>i</mi> <mi>j</mi> <mi>k</mi> </mrow> </msub> <mo>-</mo> <msub> <mi>&mu;</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> </mrow> </mtd> </mtr> </mtable> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>3</mn> <mo>)</mo> </mrow> </mrow>

Wherein, μ_iIt is the average value in characteristic pattern i, σ_i ²It is corresponding variance；γ_i(X_s) it is to be similar to β by what PNet was generated_i(X_s) C_iDimension vector γ (X_s) i-th of element；If γ_i(X_s)=1, β_i(X_s)=0, then DIN be changed into example normalization；If γ_i(X_s)=γ_i,β_i(X_s)=β_i, then they are as the direct learning parameter unrelated with PNet；DIN will be changed into condition example and return One changes；In both cases, the model all loses the ability of zero point study, therefore can not be generalized to invisible letter Number.

7. based on the parameter network (PNet) described in claims 4, it is characterised in that driven to be normalized with dynamic instance TNet, PNet can have serial or parallel framework.

8. based on the serial PNet described in claims 7, it is characterised in that in serial PNet, it can use with similar In the depth CNN of TNet structure, γ is generated in l layers^(l)(X_s) and β^(l)(X_s)；It is public on adjustment characteristic pattern in serial PNet The γ of formula (3)^(l)(X_s) and β^(l)(X_s), it is expressed as l layers in PNet of ψ_l(X_s)；

<mrow> <msup> <mi>&gamma;</mi> <mrow> <mo>(</mo> <mi>l</mi> <mo>)</mo> </mrow> </msup> <mrow> <mo>(</mo> <msub> <mi>X</mi> <mi>s</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <msub> <mi>&psi;</mi> <mi>l</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>X</mi> <mi>S</mi> </msub> <mo>)</mo> </mrow> <msubsup> <mi>W</mi> <mi>&gamma;</mi> <mrow> <mo>(</mo> <mi>l</mi> <mo>)</mo> </mrow> </msubsup> <mo>+</mo> <msubsup> <mi>b</mi> <mi>&gamma;</mi> <mrow> <mo>(</mo> <mi>l</mi> <mo>)</mo> </mrow> </msubsup> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>4</mn> <mo>)</mo> </mrow> </mrow>

Wherein, if input X_sImage, then ψ_l(X_s) be convolutional layer in TNet output；If inputting X_sBe word insertion (to Amount), then ψ_l(X_s) be the layer being fully connected output；It is the parameter to be learnt.

9. based on the parallel PNet described in claims 7, it is characterised in that (be fully connected or roll up using single shallow network Long-pending) generate ψ in l layers_l(X_s), then used it for calculating γ according to formula (4) and (5)^(l)(X_s) and β^(l)(X_s)；With from The ψ of higher level_l(X_s) generation higher level γ^(l)(X_s) and β^(l)(X_s) serial PNet it is different, be from X here_sTo γ^(l) (X_s) and β^(l)(X_s) shallow and parallel structure is followed, but this structure will limit PNet validity, and somewhat reduction generation TNet quality, and therefore produce image Y；Therefore, mostly using serial PNet.

10. based on the training described in claims 1 and test (three), it is characterised in that zero point operation network can be with loss The monitor mode of network carries out terminal training；In test phase, content images X_cWith guiding signal X_sBe separately input to TNet and In PNet, the image Y after generation conversion.