CN108509961A

CN108509961A - Image processing method and device

Info

Publication number: CN108509961A
Application number: CN201710109988.9A
Authority: CN
Inventors: 肖琦琦; 张弛
Original assignee: Beijing Megvii Technology Co Ltd; Beijing Maigewei Technology Co Ltd
Current assignee: Beijing Megvii Technology Co Ltd; Beijing Maigewei Technology Co Ltd
Priority date: 2017-02-27
Filing date: 2017-02-27
Publication date: 2018-09-07

Abstract

The embodiment provides a kind of image processing method and devices.Image processing method includes：Obtain the first image and the second image；First image is inputted into the first convolutional neural networks in trained network model, exports result to obtain first network, wherein it includes a characteristic pattern that first network, which exports result,；Second image is inputted to the second convolutional neural networks in trained network model, result is exported to obtain the second network, wherein, the second network output result includes a characteristic pattern, and the second network exports the characteristic pattern that the characteristic pattern in result is more than in first network output result；Result is exported using first network and carries out convolution as the second network of convolution kernel pair output result, to obtain convolution results；And convolution results are inputted into the rest network structure in trained network model, to obtain the comparing result of the first image and the second image.The feature that two images can simultaneously effective be handled, obtains preferable comparing result.

Description

Image processing method and device

Technical field

The present invention relates to image processing field, relate more specifically to a kind of image processing method and device.

Background technology

Image comparison can be used for the Authenticate afresh of row people or object, be played an important role in safety-security area.It is existing Method handles several images simultaneously sometimes for by convolutional neural networks, generally there is following two situations：

1. needing the image compared to access in two different convolutional neural networks by two, subsequently, based on two convolution The feature vector of neural network output carries out image comparison, or connects entirely by the output access of two convolutional neural networks is identical Logical layer is handled to obtain comparing result.Disadvantage of this is that the information of two convolutional neural networks does not interact, two After a convolutional neural networks train, training result is difficult often two different images for simultaneously effective handling input Feature carries out image comparison using different networks and just loses meaning.

2. being directly superimposed or using on a passage optical flow algorithm, the result of calculation of two images is accessed into a convolutional Neural Network, and processing method is handled by convolutional neural networks.Disadvantage of this is that convolutional neural networks are not suitable for The difference for directly reflecting two images, only obtains relatively good result under specific condition (such as scene is identical).

Invention content

The present invention is proposed in view of the above problem.The present invention provides a kind of image processing method and devices.

According to an aspect of the present invention, a kind of image processing method is provided.This method includes：Obtain the first image and second Image；First image is inputted into the first convolutional neural networks in trained network model, to obtain first network output knot Fruit, wherein it includes a characteristic pattern that first network, which exports result,；Second image is inputted to second in trained network model Convolutional neural networks, with obtain the second network export result, wherein the second network export result include a characteristic pattern, second Network exports the characteristic pattern that the characteristic pattern in result is more than in first network output result；Result is exported as volume using first network Product verification the second network output result carries out convolution, to obtain convolution results；And convolution results are inputted into trained network Rest network structure in model, to obtain the comparing result of the first image and the second image.

Illustratively, the comparing result of the first image and the second image includes one or more in following item：For referring to Show the first result in the second image with the presence or absence of the target object in the first image, the target pair being used to indicate in the first image As the position in the second image the second result, be used to indicate the first image and whether the second image belongs to same category of Three results are used to indicate whether the first image and the second image include the 4th result of shared object, are used to indicate the first image With the 5th result of position of the shared object included by the second image in the first image and the second image, be used to indicate first Whether image and the second image include the 6th result of different objects and are used to indicate included by the first image and the second image Position of the different objects in the first image and the second image the 7th result.

Illustratively, the second result includes the position of the target object that is used to indicate in the first image in the second image Characteristics of objects figure, wherein the size of characteristics of objects figure is consistent with the second image, and each pixel of characteristics of objects figure represents the second figure The respective pixel of picture belongs to the probability of the target object in the first image.

Illustratively, it obtains the first image and the second image includes：Obtain the first initial pictures；First initial pictures are held The size adjusting of first initial pictures is the first pre-set dimension by one or more operations in row scaling, shearing and filling； And determine that the first initial pictures after adjustment are the first image.

Illustratively, it obtains the first image and the second image includes：Obtain the second initial pictures；Second initial pictures are held The size adjusting of second initial pictures is the second pre-set dimension by one or more operations in row scaling, shearing and filling； And determine that the second initial pictures after adjustment are the second image.

Illustratively, image processing method further includes：Obtain first sample image and the second sample image and about the The labeled data of the comparing result of one sample image and the second sample image；Using the labeled data as the first sample figure The desired value of the comparing result of picture and second sample image builds loss function, wherein the first sample image and institute The comparing result for stating the second sample image is initial network model to the first sample image and second sample image It is handled and is exported.First sample image and the second sample image are respectively used to input first in initial network model Convolutional neural networks and the second convolutional neural networks；And it is trained using constructed loss function in initial network model Parameter, to obtain trained network model.

Illustratively, labeled data includes one or more in following item：About being used to indicate in the second sample image With the presence or absence of the first mark value of the first sample result of the target object in first sample image, about being used to indicate the first sample Second mark value of the second sample results of position of the target object in the second sample image in this image, about for referring to Show first sample image and the second image of sample whether belong to same category of third sample results third mark value, about with In instruction first sample image and the second sample image whether include shared object the 4th sample results the 4th mark value, close In being used to indicate the shared object included by first sample image and the second sample image in first sample image and the second sample 5th mark value of the 5th sample results of the position in image, about being used to indicate first sample image and the second sample image Whether include different objects the 6th sample results the 6th mark value and about being used to indicate first sample image and second 7th sample results of position of the different objects in the second image of first sample image and sample included by sample image 7th mark value.

Illustratively, the second mark value includes mark characteristics of objects figure, wherein the size and second of mark characteristics of objects figure Sample image is consistent, and each pixel of mark characteristics of objects figure represents the respective pixel of the second sample image and belongs to first sample figure The probability of target object as in.

Illustratively, rest network structure includes the full-mesh layer or increasing sample level for receiving convolution results.

According to a further aspect of the invention, a kind of image processing apparatus is provided, including：First image collection module, is used for Obtain the first image and the second image；First network input module, for inputting the first image in trained network model The first convolutional neural networks, with obtain first network export result, wherein first network export result include a feature Figure；Second network inputs module, for the second image to be inputted the second convolutional neural networks in trained network model, with Obtain the second network output result, wherein it includes a characteristic pattern that the second network, which exports result, and the second network exports in result Characteristic pattern is more than the characteristic pattern in first network output result；Convolution module, for exporting result as convolution using first network It checks the second network output result and carries out convolution, to obtain convolution results；And rest network input module, it is used for convolution knot Fruit inputs the rest network structure in trained network model, to obtain the comparing result of the first image and the second image.

Illustratively, the first image collection module includes：First acquisition submodule, for obtaining the first initial pictures；The One adjustment submodule, for the first initial pictures to be executed with one or more operations in scaling, shearing and filling, by first The size adjusting of initial pictures is the first pre-set dimension；And the first image determination sub-module, for determining first after adjustment Initial pictures are the first image.

Illustratively, the first image collection module includes：Second acquisition submodule, for obtaining the second initial pictures；The Two adjustment submodules, for the second initial pictures to be executed with one or more operations in scaling, shearing and filling, by second The size adjusting of initial pictures is the second pre-set dimension；And the second image determination sub-module, for determining second after adjustment Initial pictures are the second image.

Illustratively, image processing apparatus further includes：Second image collection module, for obtaining first sample image and Two sample images and labeled data about first sample image and the comparing result of the second sample image；Loss function is built Module, for using the labeled data as the target of the first sample image and the comparing result of second sample image Value structure loss function, wherein the comparing result of the first sample image and second sample image is initial network Model, which handles the first sample image and second sample image, to be exported.First sample image and the second sample This image is respectively used to input the first convolutional neural networks and the second convolutional neural networks in initial network model；And instruction Practice module, for training the parameter in initial network model using constructed loss function, to obtain trained network Model.

Image processing method and device according to the ... of the embodiment of the present invention handle two respectively using two convolutional neural networks The image compared is needed, and the result of two convolutional neural networks outputs is subjected to convolution.The above method and device can be simultaneously Effectively handle the feature of two images.In addition, for some image comparison problems, the above method and device are hopeful to obtain ratio Preferable result.In addition, network model used in the above method and device is compared with conventional neural network model, structure is simultaneously It does not become more sophisticated, additional problem can be avoided in training and application.

Description of the drawings

The embodiment of the present invention is described in more detail in conjunction with the accompanying drawings, the above and other purposes of the present invention, Feature and advantage will be apparent.Attached drawing is used for providing further understanding the embodiment of the present invention, and constitutes explanation A part for book is not construed as limiting the invention for explaining the present invention together with the embodiment of the present invention.In the accompanying drawings, Identical reference label typically represents same parts or step.

Fig. 1 shows showing for the exemplary electronic device for realizing image processing method according to the ... of the embodiment of the present invention and device Meaning property block diagram；

Fig. 2 shows the schematic flow charts of image processing method according to an embodiment of the invention；

Fig. 3 shows the schematic diagram of the processing procedure of the first image and the second image according to an embodiment of the invention；

Fig. 4 shows the schematic block diagram of image processing apparatus according to an embodiment of the invention；And

Fig. 5 shows the schematic block diagram of image processing system according to an embodiment of the invention.

Specific implementation mode

In order to enable the object, technical solutions and advantages of the present invention become apparent, root is described in detail below with reference to accompanying drawings According to example embodiments of the present invention.Obviously, described embodiment is only a part of the embodiment of the present invention, rather than this hair Bright whole embodiments, it should be appreciated that the present invention is not limited by example embodiment described herein.Based on described in the present invention The embodiment of the present invention, those skilled in the art's obtained all other embodiment in the case where not making the creative labor It should all fall under the scope of the present invention.

In order to solve problem as described above, a kind of image processing method of offer of the embodiment of the present invention and device, utilize Two convolutional neural networks handle two images respectively, and the output of two convolutional neural networks is carried out convolution, by this Mode can make in the training of network model or application process, comprehensively utilize the information of two images of input.The present invention Embodiment provide image processing method relatively good comparing result can be obtained under several scenes, suitable for it is various need into The field of row image comparison.

First, the example for realizing image processing method according to the ... of the embodiment of the present invention and device is described referring to Fig.1 Electronic equipment 100.

As shown in Figure 1, electronic equipment 100 include one or more processors 102, it is one or more storage device 104, defeated Enter device 106, output device 108 and image collecting device 110, these components pass through bus system 112 and/or other forms Bindiny mechanism's (not shown) interconnection.It should be noted that the component and structure of electronic equipment 100 shown in FIG. 1 are only exemplary, and Unrestricted, as needed, the electronic equipment can also have other assemblies and structure.

The processor 102 can be central processing unit (CPU) or have data-handling capacity and/or instruction execution The processing unit of the other forms of ability, and it is desired to execute to control other components in the electronic equipment 100 Function.

The storage device 104 may include one or more computer program products, and the computer program product can To include various forms of computer readable storage mediums, such as volatile memory and/or nonvolatile memory.It is described easy The property lost memory is such as may include random access memory (RAM) and/or cache memory (cache).It is described non- Volatile memory is such as may include read-only memory (ROM), hard disk, flash memory.In the computer readable storage medium On can store one or more computer program instructions, processor 102 can run described program instruction, to realize hereafter institute The client functionality (realized by processor) in the embodiment of the present invention stated and/or other desired functions.In the meter Can also store various application programs and various data in calculation machine readable storage medium storing program for executing, for example, the application program use and/or The various data etc. generated.

The input unit 106 can be the device that user is used for inputting instruction, and may include keyboard, mouse, wheat One or more of gram wind and touch screen etc..

The output device 108 can export various information (such as image and/or sound) to external (such as user), and And may include one or more of display, loud speaker etc..

Described image harvester 110 can acquire image, and acquired image is stored in the storage device So that other components use in 104.Image collecting device 110 can be camera.It should be appreciated that image collecting device 110 is only It is example, electronic equipment 100 can not include image collecting device 110.In such a case, it is possible to utilize other Image Acquisition Image of the device acquisition for image procossing, and the image of acquisition is sent to electronic equipment 100.

Illustratively, the exemplary electronic device for realizing image processing method according to the ... of the embodiment of the present invention and device can To be realized in the equipment of personal computer or remote server etc..

In the following, image processing method according to the ... of the embodiment of the present invention will be described with reference to figure 2.Fig. 2 shows according to the present invention one The schematic flow chart of the image processing method 200 of a embodiment.As shown in Fig. 2, image processing method 200 includes following step Suddenly.

In step S210, the first image and the second image are obtained.

First image and the second image can be images that is any suitable, being compared.First image and/or Two images can be the original image that the image acquisition devices such as camera arrive, and can also be to be pre-processed to original image The image obtained later.

First image and/or the second image can be sent out by client device (such as security device including monitoring camera) Electronic equipment 100 is sent to be handled by the processor 102 of electronic equipment 100, the figure that can also include by electronic equipment 100 It is handled as harvester 110 (such as camera) acquires and is transmitted to processor 102.

In step S220, the first image is inputted into the first convolutional neural networks in trained network model, to obtain First network exports result, wherein it includes a characteristic pattern that first network, which exports result,.

Trained network model includes the first convolutional neural networks, the second convolutional neural networks, convolutional layer (first network Export result and the second network output result carry out the layer of convolution) and rest network structure.First image inputs the first convolution god Input layer through network, the output layer of the first convolutional neural networks export first network corresponding with the first image and export result.

In step S230, the second image is inputted into the second convolutional neural networks in trained network model, to obtain Second network exports result, wherein it includes a characteristic pattern that the second network, which exports result, and the second network exports the feature in result Figure is more than the characteristic pattern in first network output result.

Second image inputs the input layer of the second convolutional neural networks, the output layer output and the of the second convolutional neural networks Corresponding second network of two images exports result.It should be understood that the word that " first " as described herein is similar with other with " second " Not representative sequence is only used for distinguishing purpose.

In application network model as described herein, made using the smaller output in the output of two convolutional neural networks Carry out the output of another convolutional neural networks of convolution for convolution kernel.For clarity, herein in description by the first convolutional neural networks Output be set as smaller output, be not limitation of the present invention.First convolutional neural networks and the second convolution nerve net The output of network is characteristic pattern (feature map), and the characteristic pattern of the first convolutional neural networks output is less than the second convolutional Neural The characteristic pattern of network.The size of first convolutional neural networks and the characteristic pattern of the second convolutional neural networks output can be in network mould Set in the training process of type, by after training, two convolutional neural networks can with output size with it is default of the same size Characteristic pattern.

In step S240, result is exported using first network and carries out convolution as the second network of convolution kernel pair output result, with Obtain convolution results.

First network, which exports result and the layer of the second network output result progress convolution algorithm, to be considered as in network model A convolutional layer.Briefly describe the concrete operations of convolutional layer below.In step S240, with the defeated of the first convolutional neural networks Go out for convolution kernel, convolution algorithm is made to the output of the second convolutional neural networks.Specifically, it is assumed that the first volume accumulates neural network Output is：

A=a_i,j,k(0≤i ＜ n₁, 0≤j ＜ m₁, 0≤k ＜ K)

And, it is assumed that volume Two product neural network output be：

B=b_i,j,k(0≤i ＜ n₂, 0≤j ＜ m₂, 0≤k ＜ K)

By convolution algorithm the result is that：

C=c_i,j,k(0≤i ＜ n₂-n₁, 0≤j ＜ m₂-m₁, 0≤k ＜ K)

Wherein, the dimension where k is the channel (sharing K channel) of convolutional layer, and the dimension where i, j is size, wherein A Size be n₁×m₁, the size of B is n₂×m₂, the size of C is (n₂-n₁)×(m₂-m₁)。

Assuming that A ratios B is small, then A can include centainly by B.Then, C can be expressed as according to the definition of convolution algorithm：

In routine techniques, it is assumed that A is general convolution kernel (being usually 3 × 3 convolution kernel), and B is the image being convolved, C is convolution results.After convolution, the size of C generally remains identical as B.This can be for example, by carrying out pixel around B Filling then carries out convolution to realize with A.But compared with general convolution kernel, A according to the ... of the embodiment of the present invention is larger. In this case, without making C keep the size of B, can A and B be directly subjected to convolution, the C that convolution obtains will be smaller than B.

In step S250, convolution results are inputted into the rest network structure in trained network model, to obtain first The comparing result of image and the second image.

Illustratively, rest network structure may include full-mesh layer (full-connected layer) or increasing sample level (upsampling layer).Illustratively, rest network structure can also include output layer.

For example, convolution results can input full-mesh layer, full-mesh layer is connect with output layer, and can be obtained in output layer The comparing result of one image and the second image.

Fig. 3 shows the schematic diagram of the processing procedure of the first image and the second image according to an embodiment of the invention.Such as Shown in Fig. 3, the first image and the second image input the first convolutional neural networks and the second convolutional neural networks, two convolution respectively The output of neural network carries out convolution in convolutional layer, and subsequent convolution results input full-mesh layer.It should be noted that network is not shown in Fig. 3 The complete structure of model.Output layer etc. can be connected behind full-mesh layer.In addition, the first convolutional neural networks and the second convolution god One or more convolutional neural networks are may each comprise through network, full-mesh layer can be one or more layers network structure.

If being appreciated that first network output result very little (such as only 3 × 3), first network exports result It is similar to carry out the layer of convolution and conventional convolutional layer with the second network output result.Consider that the second convolutional neural networks-are surplus This structure of remaining network structure (including such as full-mesh layer and output layer) is similar in general convolutional neural networks and commonly uses Structure, wherein first network export result and the second network output result carry out convolution layer can be regarded as the second convolution god A part through network.The area of second convolutional neural networks-rest network structure this structure and general convolutional neural networks It not essentially consisting in, this structure of the second convolutional neural networks-rest network structure exports result as convolution kernel using first network, The convolution kernel is related to the first image of input, changes with the change of the first image, rather than as in conventional technology, is instructing Convolution kernel is constant in the convolutional neural networks perfected.

Image processing method according to the ... of the embodiment of the present invention handles two needs pair respectively using two convolutional neural networks The image of ratio, and the result of two convolutional neural networks outputs is subjected to convolution, the letter of two images can be comprehensively utilized in this way Breath.Therefore, the above method can simultaneously effective handle the feature of two images so that feature related with image comparison can It is directly calculated by consolidated network model, without using multiple network models.In addition, for some image comparison problems, It is hopeful to obtain relatively good result using the above method.In addition, network model used in the above method and conventional nerve Network model is compared, and structure does not become more sophisticated, and additional problem can be avoided in training and application.

Illustratively, image processing method according to the ... of the embodiment of the present invention can be in setting with memory and processor It is realized in standby, device or system.

Image processing method according to the ... of the embodiment of the present invention can be deployed at Image Acquisition end, for example, can be deployed in The Image Acquisition end of access control system of residential community or the safety defense monitoring system for being deployed in the public places such as station, market, bank Image Acquisition end.Alternatively, image processing method according to the ... of the embodiment of the present invention is deployed in server end with can also being distributed At (or high in the clouds) and client.For example, image can be acquired in client, client sends the image collected to server It holds in (or high in the clouds), image procossing is carried out by server end (or high in the clouds).

According to embodiments of the present invention, the comparing result of the first image and the second image may include one in following item or It is multinomial：It is used to indicate the first result in the second image with the presence or absence of the target object in the first image, is used to indicate the first figure Second result of position of the target object in the second image as in is used to indicate the first image and whether the second image belongs to Same category of third result is used to indicate whether the first image and the second image include the 4th result of shared object, are used for Indicate position of the shared object included by the first image and the second image in the first image and the second image the 5th result, It is used to indicate whether the first image and the second image include the 6th result of different objects and be used to indicate the first image and the 7th result of position of the different objects in the first image and the second image included by two images.

Network model according to the ... of the embodiment of the present invention can pass through design and training for realizing a variety of different images Comparing function.The comparing result of network model output is related to the function of network model.For example, it is assumed that network model be designed and Training for from input the second convolutional neural networks image in find out input the first convolutional neural networks image in target Object, then comparing result may include characteristics of objects figure (corresponding to the second result described herein), be used to indicate the first image In position of the target object in the second image.Target object can be any kind of object, such as pedestrian, automobile, build Object, sky etc. are built, is not listed one by one.Those skilled in the art can understand other types with the description of the second result of reference pair Comparing result, do not repeat one by one.

It should be appreciated that the type of above-mentioned comparing result is only exemplary rather than limitation, the present invention is not limited to this.

According to embodiments of the present invention, the second result may include the target object that is used to indicate in the first image in the second figure The characteristics of objects figure of position as in, wherein the size of characteristics of objects figure is consistent with the second image, each picture of characteristics of objects figure The respective pixel that element represents the second image belongs to the probability of the target object in the first image.

For example, it is assumed that the first image only includes pedestrian X, network model is designed and trains for refreshing from the second convolution of input The pedestrian in the image of the first convolutional neural networks of input is found out in image through network, then the characteristics of objects of network model output Figure is pedestrian's characteristic pattern.Coordinate in pedestrian's characteristic pattern be (50,300) pixel represent the coordinate in the second image as (50,300) pixel belongs to the probability of pedestrian X.

According to embodiments of the present invention, step S210 may include：Obtain the first initial pictures；First initial pictures are executed The size adjusting of first initial pictures is the first pre-set dimension by one or more operations in scaling, shearing and filling；With And determine that the first initial pictures after adjustment are the first image.

Illustratively, the first image of the first convolutional neural networks of input has fixed size.If initial acquisition The size of image is unsatisfactory for requiring, and can be adjusted.

In one example, can simply by scaling by the size adjusting of the first initial pictures be the first pre-set dimension (such as 128 × 64), to obtain the first required image.In another example, if the size of the first initial pictures is more than the One pre-set dimension, then can by shearing by the size adjusting of the first initial pictures be the first pre-set dimension, with obtain needed for First image.It in another example, can be by with black if the size of the first initial pictures is less than the first pre-set dimension The size adjusting of first initial pictures is the first pre-set dimension by the mode of pixel filling, to obtain the first required image.

First pre-set dimension can be set as needed, and the present invention limits not to this.

According to embodiments of the present invention, step S210 may include：Obtain the first initial pictures；And to the first initial pictures One or more operations in scaling, shearing and filling are executed, are the first default ruler by the size adjusting of the first initial pictures It is very little；And determine that the first initial pictures after adjustment are the first image.

In one example, can simply by scaling by the size adjusting of the second initial pictures be the second pre-set dimension (such as 256 × 128), to obtain the second required image.In another example, if the size of the second initial pictures is more than the Two pre-set dimensions, then can by shearing by the size adjusting of the second initial pictures be the second pre-set dimension, with obtain needed for Second image.It in another example, can be by with black if the size of the second initial pictures is less than the second pre-set dimension The size adjusting of second initial pictures is the second pre-set dimension by the mode of pixel filling, to obtain the second required image.

Second pre-set dimension can be set as needed, and the present invention limits not to this.Illustratively, the second default ruler It is very little to be more than the first pre-set dimension, that is to say, that the image of the second convolutional neural networks of input is more than the first convolution nerve net of input The image of network is so that it is convenient to which so that the characteristic pattern in the second network output result is more than the feature in first network output result Figure.

According to embodiments of the present invention, image processing method 200 can also include：Obtain first sample image and the second sample Image and labeled data about first sample image and the comparing result of the second sample image；Using the labeled data as The desired value of the comparing result of the first sample image and second sample image builds loss function, wherein described the The comparing result of one sample image and second sample image is initial network model to the first sample image and institute It states second sample image and is handled and exported.First sample image and the second sample image are respectively used to input initial net The first convolutional neural networks in network model and the second convolutional neural networks；And it is trained using constructed loss function initial Network model in parameter, to obtain trained network model.

The training process of the network model of citing description below.In one example, training network model is so that it can From the target object found out in the image of the second convolutional neural networks of input in the image of the first convolutional neural networks of input.Example Such as, it is assumed that first sample image includes pedestrian Y, and the second sample image includes multiple pedestrians, and multiple pedestrians include pedestrian Y.Instruction The purpose for practicing network model is so as to find out the position of pedestrian Y from the second sample image.In such a case, it is possible to The position in the second sample image where pedestrian Y is marked out in advance, for example, the part that pedestrian Y occurs can be labeled as 1, He is partly labeled as 0.In such a case, it is possible to which it is in the same size with the second sample image that network model is designed as output one Sample object characteristic pattern.Each pixel of sample object characteristic pattern represents in the second sample image and pixel coordinate one The pixel of cause belongs to the probability of pedestrian Y.For example, the pixel that the coordinate in sample object characteristic pattern is (100,100) represents second Coordinate in sample image is that the pixel of (100,100) belongs to the probability of pedestrian Y.In this way, in the training process, network model meeting A result (i.e. sample object characteristic pattern) is exported, the result and the prior data marked are subjected to operation with loss function, Judge whether to meet the requirements.Constantly train the parameter in network model, network model that can gradually restrain by using loss function, Finally obtain trained network model.

It in above-mentioned training process, can be trained only with positive sample, positive sample and negative sample can also be used to instruct together Practice.Positive sample i.e. the second sample image includes the target object in first sample image.Negative sample i.e. the second sample image does not wrap The target object in first sample image is included, in this case, entire second sample image can be labeled as in mark 0。

Above-mentioned after training in process, in practical applications, trained network model can be used for from input volume Two Pedestrian included in the image of the first convolutional neural networks of input is found out in the image of product neural network.

In another example, training network model is so that it can judge to input the first convolutional neural networks and volume Two Whether two images of product neural network belong to same category.For example, first sample image and the second sample image can all be Including the image of pedestrian, labeled data can be 1, represent first sample image and the second sample image belongs to same category, this The first sample image and the second sample image of sample are as positive sample.In another example first sample image can include pedestrian Image, the second sample image can be the images for including animal, and labeled data can be 0, represent first sample image and second Sample image is not belonging to same category, and such first sample image and the second sample image are as negative sample.In this example, Network model can be designed as to the confidence level that one value range of output is [0,1], represent first sample image and the second sample This image belongs to same category of probability.In this way, in the training process, network model can export a result (i.e. confidence level), The result and the prior data marked are subjected to operation with loss function, judge whether to meet the requirements.By using loss letter Parameter in number constantly training network model, network model can gradually restrain, and finally obtain trained network model.

Above-mentioned after training in process, in practical applications, trained network model can be used for judging input first Whether two images of convolutional neural networks and the second convolutional neural networks belong to same category.

The training method and purposes of above-mentioned network model are only exemplary rather than limitation, and network model can be through designing and instructing Practice the application for being related to two image comparisons for other, such as, it can be determined that two images are with the presence or absence of common ground and/or altogether With the position put, judge position etc. of two images with the presence or absence of difference and/or difference.

According to embodiments of the present invention, labeled data includes one or more in following item：About being used to indicate the second sample With the presence or absence of the first mark value of the first result of the target object in first sample image, about being used to indicate the in this image Second mark value of the second result of position of the target object in the second sample image in one sample image, about for referring to Show whether first sample image and the second image of sample belong to the third mark value of same category of third result, about for referring to Show first sample image and the second sample image whether include shared object the 4th result the 4th mark value, about for referring to Show the shared object included by first sample image and the second sample image in first sample image and the second sample image 5th mark value of the 5th result of position, about being used to indicate whether first sample image and the second sample image include difference 6th mark value of the 6th result of object and about being used to indicate included by first sample image and the second sample image 7th mark value of the 7th result of position of the different objects in the second image of first sample image and sample.

Those skilled in the art can refer to the first result, the second result, third knot to being obtained in practical application above The description of fruit, the 4th result, the 5th result, the 6th result and the 7th result understands first sample result, the second sample results, Meaning and the effect of three sample results, the 4th sample results, the 5th sample results, the 6th sample results and the 7th sample results, It does not repeat herein.

According to embodiments of the present invention, the second mark value may include mark characteristics of objects figure, wherein mark characteristics of objects figure Size it is consistent with the second sample image, each pixel of mark characteristics of objects figure represents the respective pixel category of the second sample image The probability of target object in first sample image.

Meaning and the effect of sample object characteristic pattern is hereinbefore described, those skilled in the art can be with reference sample pair Understand mark characteristics of objects figure as characteristic pattern, does not repeat herein.In the training process of network model, sample object characteristic pattern The output of network model can be considered as a result, mark characteristics of objects figure can be considered as desired value, using loss function to sample pair As characteristic pattern and mark characteristics of objects figure progress operation, the gap of the two can be known, and then network model is adjusted according to gap In parameter.

According to embodiments of the present invention, above-mentioned loss function can be cross entropy loss function.Cross entropy loss function is being counted Punishment dynamics are bigger when calculation value differs bigger with desired value, and difference more hour punishment dynamics are smaller, therefore, cross entropy loss function Relatively good result can be obtained in classification problem.In description exemplified here from finding out another image in an image Object and judge in the problem of whether two images belong to same category, can be trained using cross entropy loss function Network model.

According to embodiments of the present invention, rest network structure may include that full-mesh layer for receiving convolution results or increase is adopted Sample layer.

Illustratively, in description exemplified here from the problem of finding out the target object in another image in an image In, full-mesh layer may be used as layer (convolutional layer) phase for exporting result and the second output result progress convolution with first network The layer of connection receives convolution results.Full-mesh layer as described herein may be used conventional full-mesh layer and realize, behind can connect Connect the output layer of whole network model.In this case, output layer will export characteristics of objects figure.The meaning of characteristics of objects figure is Through in above description, not repeating herein.

Illustratively, judge the problem of whether two images belong to same category in description exemplified here, it can be with It is connected as the layer (convolutional layer) for carrying out convolution with first network output result and the second output result using increasing sample level Layer receives convolution results.It is as described herein increase sample level conventional increasing sample level may be used realize, behind can connect it is entire The output layer of network model.In this case, output layer will export confidence level.The meaning of confidence level has been described above, It does not repeat herein.Certainly, judge the problem of whether two images belong to same category in description exemplified here, it can also Using full-mesh layer.It only needs the characteristic pattern for exporting full-mesh layer to handle, entire characteristic pattern is changed into a number Value, i.e. confidence level.

According to embodiments of the present invention, any of the first convolutional neural networks and the second convolutional neural networks can be adopted It is realized with based on the trained VGG16 real-time performances of ImageNet data sets, or using residual error network (ResNet).

One important function of network model according to the ... of the embodiment of the present invention is in practical applications, to change input first When the image of convolutional neural networks, the output of network model can be varied from, and be generally possible to provide correct result.However, sharp It is difficult to accomplish with traditional neural network structure.It illustrating, it is assumed that the second image includes pedestrian M, pedestrian N, pedestrian L, And assume to input for the first time the first image includes is pedestrian M, purpose at this time is desirable to find out row from the second image People M, that the first image of input includes for the second time is pedestrian N, and purpose at this time is desirable to find out pedestrian N from the second image, Pedestrian M relatively accurately can found out using network model provided in an embodiment of the present invention for the first time, finding out row for the second time People N, and use conventional neural network structure that may can find out pedestrian M in first time, pedestrian N can not be but being found out for the second time.

Illustrate the scalability of network model provided in an embodiment of the present invention below.As convolutional neural networks can be handled Many problems are the same, and network model as described herein can be understood as mainly changing the part among conventional network model, right All not big change and special demand are output and input, therefore can be realized by different input pictures and training method Various functions.In addition, the first convolutional neural networks and the second convolutional neural networks in network model can be replaced, i.e., Make to be the specific problem of processing, can also be trained using the preferable convolutional neural networks of existing effect.

According to a further aspect of the invention, a kind of image processing apparatus is provided.Fig. 4 is shown according to an embodiment of the present invention Image processing apparatus 400 schematic block diagram.

As shown in figure 4, image processing apparatus 400 according to the ... of the embodiment of the present invention includes the first image collection module 410, the One network inputs module 420, the second network inputs module 430, convolution module 440 and rest network input module 450.It is described each A module can execute each step/function above in conjunction with Fig. 2-3 image processing methods described respectively.Below only to the figure As the major function of each component of processing unit 400 is described, and omit the detail content having been described above.

First image collection module 410 is for obtaining the first image and the second image.First image collection module 410 can be with The program instruction that is stored in 102 Running storage device 104 of processor in electronic equipment as shown in Figure 1 is realized.

First network input module 420 is used to inputting the first image into the first convolutional Neural in trained network model Network exports result, wherein it includes a characteristic pattern that first network, which exports result, to obtain first network.First network inputs Module 420 can be as shown in Figure 1 electronic equipment in 102 Running storage device 104 of processor in the program instruction that stores come It realizes.

Second network inputs module 430 is used to inputting the second image into the second convolutional Neural in trained network model Network exports result to obtain the second network, wherein it includes a characteristic pattern, the output of the second network that the second network, which exports result, As a result the characteristic pattern in is more than the characteristic pattern in first network output result.Second network inputs module 430 can be as shown in Figure 1 Electronic equipment in 102 Running storage device 104 of processor in the program instruction that stores realize.

Convolution module 440 is used to export result using first network and be rolled up as the second network of convolution kernel pair output result Product, to obtain convolution results.Convolution module 440 can be as shown in Figure 1 electronic equipment in 102 Running storage device of processor The program instruction that is stored in 104 is realized.

Rest network input module 450 is used to inputting convolution results into the rest network knot in trained network model Structure, to obtain the comparing result of the first image and the second image.Rest network input module 450 can be as shown in Figure 1 electronics The program instruction that is stored in 102 Running storage device 104 of processor in equipment is realized.

According to embodiments of the present invention, the comparing result of the first image and the second image includes one or more in following item ：It is used to indicate the first result in the second image with the presence or absence of the target object in the first image, is used to indicate the first image In position of the target object in the second image the second result, be used to indicate the first image and whether the second image belongs to same A kind of other third result is used to indicate whether the first image and the second image include the 4th result of shared object, for referring to Show the 5th result of position of the shared object included by the first image and the second image in the first image and the second image, use Whether include the 6th result of different objects and be used to indicate the first image and second in the first image of instruction and the second image 7th result of position of the different objects in the first image and the second image included by image.

According to embodiments of the present invention, the second result includes the target object that is used to indicate in the first image in the second image Position characteristics of objects figure, wherein the size of characteristics of objects figure is consistent with the second image, each pixel generation of characteristics of objects figure The respective pixel of the second image of table belongs to the probability of the target object in the first image.

According to embodiments of the present invention, the first image collection module 410 includes：First acquisition submodule, for obtaining first Initial pictures；The first adjustment submodule, for the first initial pictures to be executed with one or more behaviour in scaling, shearing and filling Make, is the first pre-set dimension by the size adjusting of the first initial pictures；And the first image determination sub-module, it is adjusted for determining The first initial pictures after whole are the first image.

According to embodiments of the present invention, the first image collection module 410 includes：Second acquisition submodule, for obtaining second Initial pictures；Second adjustment submodule, for the second initial pictures to be executed with one or more behaviour in scaling, shearing and filling Make, is the second pre-set dimension by the size adjusting of the second initial pictures；And the second image determination sub-module, it is adjusted for determining The second initial pictures after whole are the second image.

According to embodiments of the present invention, image processing apparatus 400 further includes：Second image collection module (not shown), is used for Obtain first sample image and the second sample image and about the comparing result of first sample image and the second sample image Labeled data；Loss function builds module (not shown), for using the labeled data as the first sample image and institute State the desired value structure loss function of the comparing result of the second sample image, wherein the first sample image and described second The comparing result of sample image be initial network model to the first sample image and second sample image at What reason was exported, first sample image and the second sample image are respectively used to input the god of the first convolution in initial network model Through network and the second convolutional neural networks；And training module (not shown), for being trained just using constructed loss function Parameter in the network model of beginning, to obtain trained network model.

According to embodiments of the present invention, labeled data includes one or more in following item：About being used to indicate the second sample With the presence or absence of the first mark value of the first sample result of the target object in first sample image, about for referring in this image Show the second mark value of the second sample results of position of the target object in the second sample image in first sample image, close In being used to indicate first sample image and whether the second image of sample belongs to the third marks of same category of third sample results Value, about be used to indicate first sample image and the second sample image whether include shared object the 4th sample results the 4th Mark value, about the shared object being used to indicate included by first sample image and the second sample image in first sample image and 5th mark value of the 5th sample results of the position in the second sample image, about being used to indicate first sample image and second Sample image whether include different objects the 6th sample results the 6th mark value and about being used to indicate first sample figure 7th sample of position of the different objects in the second image of first sample image and sample included by picture and the second sample image 7th mark value of this result.

According to embodiments of the present invention, the second mark value includes mark characteristics of objects figure, wherein marks the big of characteristics of objects figure Small consistent with the second sample image, the respective pixel that each pixel of mark characteristics of objects figure represents the second sample image belongs to the The probability of target object in one sample image.

According to embodiments of the present invention, rest network structure includes the full-mesh layer for receiving convolution results or increasing sampling Layer.

Those of ordinary skill in the art may realize that lists described in conjunction with the examples disclosed in the embodiments of the present disclosure Member and algorithm steps can be realized with the combination of electronic hardware or computer software and electronic hardware.These functions are actually It is implemented in hardware or software, depends on the specific application and design constraint of technical solution.Professional technician Each specific application can be used different methods to achieve the described function, but this realization is it is not considered that exceed The scope of the present invention.

Fig. 5 shows the schematic block diagram of image processing system 500 according to an embodiment of the invention.Image procossing system System 500 includes image collecting device 510, storage device 520 and processor 530.

Image collecting device 510 is for acquiring image.Image collecting device 510 is optional, and image processing system 500 can Not include image collecting device 510.In such a case, it is possible to be used for image procossing using other image acquisition devices Image, and the image of acquisition is sent to image processing system 500.

The storage device 520 stores for realizing the corresponding steps in image processing method according to the ... of the embodiment of the present invention Program code.

The processor 530 is for running the program code stored in the storage device 520, to execute according to the present invention The corresponding steps of the image processing method of embodiment, and for realizing image processing apparatus 400 according to the ... of the embodiment of the present invention In the first image collection module 410, first network input module 420, the second network inputs module 430,440 and of convolution module Rest network input module 450.

In one embodiment, make described image processing system 500 when said program code is run by the processor 530 Execute following steps：Obtain the first image and the second image；First image is inputted into the first volume in trained network model Product neural network exports result, wherein it includes a characteristic pattern that first network, which exports result, to obtain first network；By second Image inputs the second convolutional neural networks in trained network model, exports result to obtain the second network, wherein second It includes a characteristic pattern that network, which exports result, and the second network exports the characteristic pattern in result and is more than in first network output result Characteristic pattern；Result is exported using first network and carries out convolution as the second network of convolution kernel pair output result, to obtain convolution results； And convolution results are inputted into the rest network structure in trained network model, to obtain the first image and the second image Comparing result.

In one embodiment, the comparing result of the first image and the second image includes one or more in following item： It is used to indicate the first result in the second image with the presence or absence of the target object in the first image, is used to indicate in the first image Second result of position of the target object in the second image is used to indicate the first image and whether the second image belongs to same class Other third result is used to indicate whether the first image and the second image include the 4th result of shared object, is used to indicate the 5th result of position of the shared object in the first image and the second image included by one image and the second image, for referring to Show whether the first image and the second image include the 6th result of different objects and be used to indicate the first image and the second image 7th result of position of the included different objects in the first image and the second image.

In one embodiment, the second result includes the target object that is used to indicate in the first image in the second image The characteristics of objects figure of position, wherein the size of characteristics of objects figure is consistent with the second image, and each pixel of characteristics of objects figure represents The respective pixel of second image belongs to the probability of the target object in the first image.

In one embodiment, make described image processing system 500 when said program code is run by the processor 530 The step of performed the first image of acquisition and the second image includes：Obtain the first initial pictures；First initial pictures are executed The size adjusting of first initial pictures is the first pre-set dimension by one or more operations in scaling, shearing and filling；With And determine that the first initial pictures after adjustment are the first image.

In one embodiment, make described image processing system 500 when said program code is run by the processor 530 The step of performed the first image of acquisition and the second image includes：Obtain the second initial pictures；Second initial pictures are executed The size adjusting of second initial pictures is the second pre-set dimension by one or more operations in scaling, shearing and filling；With And determine that the second initial pictures after adjustment are the second image.

In one embodiment, also make described image processing system when said program code is run by the processor 530 500 execute：Obtain first sample image and the second sample image and pair about first sample image and the second sample image Than the labeled data of result；Using the labeled data as the comparison knot of the first sample image and second sample image The desired value of fruit builds loss function, wherein the comparing result of the first sample image and second sample image is first The network model of beginning, which handles the first sample image and second sample image, to be exported, first sample image It is respectively used to input the first convolutional neural networks and the second convolution nerve net in initial network model with the second sample image Network；And parameter in initial network model is trained using constructed loss function, to obtain trained network model.

In one embodiment, labeled data includes one or more in following item：About being used to indicate the second sample With the presence or absence of the first mark value of the first sample result of the target object in first sample image, about being used to indicate in image Second mark value of the second sample results of position of the target object in the second sample image in first sample image, about Be used to indicate first sample image and the second image of sample whether belong to same category of third sample results third mark value, About be used to indicate first sample image and the second sample image whether include shared object the 4th sample results the 4th mark Note value, about the shared object being used to indicate included by first sample image and the second sample image in first sample image and 5th mark value of the 5th sample results of the position in two sample images, about being used to indicate first sample image and the second sample This image whether include different objects the 6th sample results the 6th mark value and about being used to indicate first sample image The 7th sample of position of the different objects in the second image of first sample image and sample with included by the second sample image As a result the 7th mark value.

In one embodiment, the second mark value includes mark characteristics of objects figure, wherein the size of mark characteristics of objects figure Consistent with the second sample image, each pixel of mark characteristics of objects figure represents the respective pixel of the second sample image and belongs to first The probability of target object in sample image.

In one embodiment, rest network structure includes the full-mesh layer or increasing sample level for receiving convolution results.

In addition, according to embodiments of the present invention, additionally providing a kind of storage medium, storing program on said storage Instruction, the image processing method when described program instruction is run by computer or processor for executing the embodiment of the present invention Corresponding steps, and for realizing the corresponding module in image processing apparatus according to the ... of the embodiment of the present invention.The storage medium Such as may include the storage card of smart phone, the storage unit of tablet computer, the hard disk of personal computer, read-only memory (ROM), Erasable Programmable Read Only Memory EPROM (EPROM), portable compact disc read-only memory (CD-ROM), USB storage, Or the arbitrary combination of above-mentioned storage medium.

In one embodiment, the computer program instructions can to calculate when by computer or processor operation Machine or processor realize each function module of image processing apparatus according to the ... of the embodiment of the present invention, and/or can execute Image processing method according to the ... of the embodiment of the present invention.

In one embodiment, the computer program instructions execute the computer when being run by computer following Step：Obtain the first image and the second image；First image is inputted to the first convolution nerve net in trained network model Network exports result, wherein it includes a characteristic pattern that first network, which exports result, to obtain first network；Second image is inputted The second convolutional neural networks in trained network model export result, wherein the second network is exported to obtain the second network As a result include a characteristic pattern, the second network exports the characteristic pattern that the characteristic pattern in result is more than in first network output result； Result is exported using first network and carries out convolution as the second network of convolution kernel pair output result, to obtain convolution results；And it will Convolution results input the rest network structure in trained network model, to obtain the comparison knot of the first image and the second image Fruit.

In one embodiment, the computer program instructions make when being run by computer performed by the computer The step of obtaining the first image and the second image include：Obtain the first initial pictures；Scaling, shearing are executed to the first initial pictures It is the first pre-set dimension by the size adjusting of the first initial pictures with one or more operations in filling；And it determines and adjusts The first initial pictures after whole are the first image.

In one embodiment, the computer program instructions make when being run by computer performed by the computer The step of obtaining the first image and the second image include：Obtain the second initial pictures；Scaling, shearing are executed to the second initial pictures It is the second pre-set dimension by the size adjusting of the second initial pictures with one or more operations in filling；And it determines and adjusts The second initial pictures after whole are the second image.

In one embodiment, the computer program instructions when being run by computer execute also the computer： Obtain first sample image and the second sample image and about the comparing result of first sample image and the second sample image Labeled data；Using the labeled data as the target of the first sample image and the comparing result of second sample image Value structure loss function, wherein the comparing result of the first sample image and second sample image is initial network Model, which handles the first sample image and second sample image, to be exported, first sample image and the second sample This image is respectively used to input the first convolutional neural networks and the second convolutional neural networks in initial network model；And profit The parameter in initial network model is trained with constructed loss function, to obtain trained network model.

Each module in image processing system according to the ... of the embodiment of the present invention can pass through reality according to the ... of the embodiment of the present invention The processor computer program instructions that store in memory of operation of the electronic equipment of image procossing are applied to realize, or can be with The computer instruction stored in the computer readable storage medium of computer program product according to the ... of the embodiment of the present invention is counted Calculation machine is realized when running.

Although describing example embodiment by reference to attached drawing here, it should be understood that the above example embodiment is merely exemplary , and be not intended to limit the scope of the invention to this.Those of ordinary skill in the art can carry out various changes wherein And modification, it is made without departing from the scope of the present invention and spiritual.All such changes and modifications are intended to be included in appended claims Within required the scope of the present invention.

In several embodiments provided herein, it should be understood that disclosed device and method can pass through it Its mode is realized.For example, apparatus embodiments described above are merely indicative, for example, the division of the unit, only Only a kind of division of logic function, formula that in actual implementation, there may be another division manner, such as multiple units or component can be tied Another equipment is closed or is desirably integrated into, or some features can be ignored or not executed.

In the instructions provided here, numerous specific details are set forth.It is to be appreciated, however, that the implementation of the present invention Example can be put into practice without these specific details.In some instances, well known method, structure is not been shown in detail And technology, so as not to obscure the understanding of this description.

Similarly, it should be understood that in order to simplify the present invention and help to understand one or more of each inventive aspect, To in the description of exemplary embodiment of the present invention, each feature of the invention be grouped together into sometimes single embodiment, figure, Or in descriptions thereof.However, the method for the present invention should be construed to reflect following intention：It is i.e. claimed The present invention claims the more features of feature than being expressly recited in each claim.More precisely, such as corresponding power As sharp claim reflects, inventive point is that the spy of all features less than some disclosed single embodiment can be used It levies to solve corresponding technical problem.Therefore, it then follows thus claims of specific implementation mode are expressly incorporated in this specific Embodiment, wherein each claim itself is as a separate embodiment of the present invention.

It will be understood to those skilled in the art that other than mutually exclusive between feature, any combinations pair may be used All features disclosed in this specification (including adjoint claim, abstract and attached drawing) and so disclosed any method Or all processes or unit of equipment are combined.Unless expressly stated otherwise, this specification (including want by adjoint right Ask, make a summary and attached drawing) disclosed in each feature can be replaced by providing the alternative features of identical, equivalent or similar purpose.

In addition, it will be appreciated by those of skill in the art that although some embodiments described herein include other embodiments In included certain features rather than other feature, but the combination of the feature of different embodiments means in of the invention Within the scope of and form different embodiments.For example, in detail in the claims, embodiment claimed it is one of arbitrary It mode can use in any combination.

The all parts embodiment of the present invention can be with hardware realization, or to run on one or more processors Software module realize, or realized with combination thereof.It will be understood by those of skill in the art that can use in practice Microprocessor or digital signal processor (DSP) realize some moulds in image processing apparatus according to the ... of the embodiment of the present invention The some or all functions of block.The present invention is also implemented as the part or complete for executing method as described herein The program of device (for example, computer program and computer program product) in portion.It is such to realize that the program of the present invention store It on a computer-readable medium, or can be with the form of one or more signal.Such signal can be from internet It downloads and obtains on website, either provide on carrier signal or provide in any other forms.

It should be noted that the present invention will be described rather than limits the invention for above-described embodiment, and ability Field technique personnel can design alternative embodiment without departing from the scope of the appended claims.In the claims, Any reference mark between bracket should not be configured to limitations on claims.Word "comprising" does not exclude the presence of not Element or step listed in the claims.Word "a" or "an" before element does not exclude the presence of multiple such Element.The present invention can be by means of including the hardware of several different elements and being come by means of properly programmed computer real It is existing.In the unit claims listing several devices, several in these devices can be by the same hardware branch To embody.The use of word first, second, and third does not indicate that any sequence.These words can be explained and be run after fame Claim.

The above description is merely a specific embodiment or to the explanation of specific implementation mode, protection of the invention Range is not limited thereto, and any one skilled in the art in the technical scope disclosed by the present invention, can be easily Expect change or replacement, should be covered by the protection scope of the present invention.Protection scope of the present invention should be with claim Subject to protection domain.

Claims

1. a kind of image processing method, including：

Obtain the first image and the second image；

Described first image is inputted into the first convolutional neural networks in trained network model, to obtain first network output As a result, wherein the first network output result includes a characteristic pattern；

Second image is inputted into the second convolutional neural networks in the trained network model, to obtain the second network Export result, wherein the second network output result includes a characteristic pattern, the feature in the second network output result Figure is more than the characteristic pattern in first network output result；

Result is exported using the first network, convolution is carried out to second network output result as convolution kernel, to obtain convolution As a result；And

The convolution results are inputted into the rest network structure in the trained network model, to obtain described first image With the comparing result of second image.

2. image processing method as described in claim 1, wherein the comparing result of described first image and second image Including one or more in following item：It is used to indicate in second image with the presence or absence of the target pair in described first image Second knot of the position of the first result of elephant, the target object being used to indicate in described first image in second image Fruit, be used to indicate described first image and second image whether belong to same category of third result, be used to indicate it is described Whether the first image and second image include the 4th result of shared object, is used to indicate described first image and described the 5th result of position of the shared object in described first image and second image included by two images is used to indicate Whether described first image and second image include the 6th result of different objects and are used to indicate described first image With the 7th result of position of the different objects included by second image in described first image and second image.

3. image processing method as claimed in claim 2, wherein second result includes being used to indicate described first image In position of the target object in second image characteristics of objects figure, wherein the size of the characteristics of objects figure and institute State that the second image is consistent, the respective pixel that each pixel of the characteristics of objects figure represents second image belongs to described first The probability of the target object in image.

4. image processing method as described in claim 1, wherein the first image of the acquisition and the second image include：

Obtain the first initial pictures；

One or more operations in scaling, shearing and filling are executed to first initial pictures, it is initial by described first The size adjusting of image is the first pre-set dimension；And

Determine that the first initial pictures after adjustment are described first image.

5. image processing method as described in claim 1, wherein the first image of the acquisition and the second image include：

Obtain the second initial pictures；

One or more operations in scaling, shearing and filling are executed to second initial pictures, it is initial by described second The size adjusting of image is the second pre-set dimension；And

Determine that the second initial pictures after adjustment are second image.

6. image processing method as described in claim 1, wherein described image processing method further includes：

Obtain first sample image and the second sample image and pair about the first sample image and the second sample image Than the labeled data of result；

Using the labeled data as the desired value structure of the first sample image and the comparing result of second sample image Build loss function, wherein the comparing result of the first sample image and second sample image is initial network model The first sample image and second sample image are handled and exported, the first sample image and described Two sample images are respectively used to input the first convolutional neural networks and the second convolution nerve net in the initial network model Network；And

The parameter in the initial network model is trained using constructed loss function, to obtain the trained network Model.

7. image processing method as claimed in claim 6, wherein the labeled data includes one or more in following item ：About being used to indicate in second sample image with the presence or absence of the first sample of the target object in the first sample image First mark value of this result, about the target object being used to indicate in the first sample image in second sample image In position the second sample results the second mark value, about being used to indicate the first sample image and the sample second Image whether belong to same category of third sample results third mark value, about be used to indicate the first sample image and Second sample image whether include shared object the 4th sample results the 4th mark value, about being used to indicate described the Shared object included by one sample image and second sample image is in the first sample image and second sample 5th mark value of the 5th sample results of the position in image, about being used to indicate the first sample image and described second Sample image whether include different objects the 6th sample results the 6th mark value and about being used to indicate first sample Different objects included by this image and second sample image are in the second image of the first sample image and the sample In position the 7th sample results the 7th mark value.

8. image processing method as claimed in claim 7, wherein second mark value includes mark characteristics of objects figure, In, the size of the mark characteristics of objects figure is consistent with second sample image, each picture of the mark characteristics of objects figure The respective pixel that element represents second sample image belongs to the probability of the target object in the first sample image.

9. image processing method as described in claim 1, wherein the rest network structure includes for receiving the convolution As a result full-mesh layer or increasing sample level.

10. a kind of image processing apparatus, including：

First image collection module, for obtaining the first image and the second image；

First network input module, for described first image to be inputted to the first convolution nerve net in trained network model Network exports result to obtain first network, wherein the first network output result includes a characteristic pattern；

Second network inputs module, for second image to be inputted to the god of the second convolution in the trained network model Through network, result is exported to obtain the second network, wherein the second network output result includes a characteristic pattern, and described the Two networks export the characteristic pattern that the characteristic pattern in result is more than in first network output result；

Convolution module rolls up second network output result as convolution kernel for exporting result using the first network Product, to obtain convolution results；And

Rest network input module, for the convolution results to be inputted to the rest network knot in the trained network model Structure, to obtain the comparing result of described first image and second image.

11. image processing apparatus as claimed in claim 10, wherein the comparison knot of described first image and second image Fruit includes one or more in following item：It is used to indicate in second image with the presence or absence of the target in described first image Second knot of the position of the first result of object, the target object being used to indicate in described first image in second image Fruit, be used to indicate described first image and second image whether belong to same category of third result, be used to indicate it is described Whether the first image and second image include the 4th result of shared object, is used to indicate described first image and described the 5th result of position of the shared object in described first image and second image included by two images is used to indicate Whether described first image and second image include the 6th result of different objects and are used to indicate described first image With the 7th result of position of the different objects included by second image in described first image and second image.

12. image processing apparatus as claimed in claim 11, wherein second result includes being used to indicate first figure The characteristics of objects figure of position of the target object in second image as in, wherein the size of the characteristics of objects figure with Second image is consistent, and the respective pixel that each pixel of the characteristics of objects figure represents second image belongs to described The probability of the target object in one image.

13. image processing apparatus as claimed in claim 10, wherein described first image acquisition module includes：

First acquisition submodule, for obtaining the first initial pictures；

The first adjustment submodule, for first initial pictures to be executed with one or more behaviour in scaling, shearing and filling Make, is the first pre-set dimension by the size adjusting of first initial pictures；And

First image determination sub-module, for determining that the first initial pictures after adjustment are described first image.

14. image processing apparatus as claimed in claim 10, wherein described first image acquisition module includes：

Second acquisition submodule, for obtaining the second initial pictures；

Second adjustment submodule, for second initial pictures to be executed with one or more behaviour in scaling, shearing and filling Make, is the second pre-set dimension by the size adjusting of second initial pictures；And

Second image determination sub-module, for determining that the second initial pictures after adjustment are second image.

15. image processing apparatus as claimed in claim 10, wherein described image processing unit further includes：

Second image collection module, for obtaining first sample image and the second sample image and about the first sample figure The labeled data of the comparing result of picture and the second sample image；

Loss function builds module, for using the labeled data as the first sample image and second sample image Comparing result desired value build loss function, wherein the comparison of the first sample image and second sample image As a result the first sample image and second sample image handle by initial network model and be exported, it is described First sample image and second sample image are respectively used to input the first convolutional Neural in the initial network model Network and the second convolutional neural networks；And

Training module, for training the parameter in the initial network model using constructed loss function, to obtain State trained network model.

16. image processing apparatus as claimed in claim 15, wherein the labeled data includes one or more in following item ：About being used to indicate in second sample image with the presence or absence of the first sample of the target object in the first sample image First mark value of this result, about the target object being used to indicate in the first sample image in second sample image In position the second sample results the second mark value, about being used to indicate the first sample image and the sample second Image whether belong to same category of third sample results third mark value, about be used to indicate the first sample image and Second sample image whether include shared object the 4th sample results the 4th mark value, about being used to indicate described the Shared object included by one sample image and second sample image is in the first sample image and second sample 5th mark value of the 5th sample results of the position in image, about being used to indicate the first sample image and described second Sample image whether include different objects the 6th sample results the 6th mark value and about being used to indicate first sample Different objects included by this image and second sample image are in the second image of the first sample image and the sample In position the 7th sample results the 7th mark value.

17. image processing apparatus as claimed in claim 16, wherein second mark value includes mark characteristics of objects figure, Wherein, the size of the mark characteristics of objects figure is consistent with second sample image, each of described mark characteristics of objects figure The respective pixel that pixel represents second sample image belongs to the probability of the target object in the first sample image.

18. image processing apparatus as claimed in claim 10, wherein the rest network structure includes for receiving the volume The full-mesh layer of product result increases sample level.