CN110147796A

CN110147796A - Image matching method and device

Info

Publication number: CN110147796A
Application number: CN201810144894.XA
Authority: CN
Inventors: 张鼎
Original assignee: Hangzhou Hikvision Digital Technology Co Ltd
Current assignee: Hangzhou Hikvision Digital Technology Co Ltd
Priority date: 2018-02-12
Filing date: 2018-02-12
Publication date: 2019-08-20

Abstract

The invention discloses a kind of image matching method and devices, belong to field of computer technology.The described method includes: obtaining images to be recognized and benchmark image；Model is extracted based on fisrt feature, in benchmark image, extracts fisrt feature image, and model is extracted based on second feature, in images to be recognized, extracts the location information of the characteristic image and each candidate object of multiple candidate objects in the images to be recognized；In the characteristic image of the multiple candidate object, the determining and highest second feature image of the fisrt feature images match degree；By location information of the corresponding candidate object of the second feature image in the images to be recognized, it is determined as the location information in the region to match in the images to be recognized with the benchmark image, and by the matching degree of the second feature image and the fisrt feature image, it is determined as the matching degree of image in the benchmark image and the region.Using the present invention, the efficiency of images match can be made to improve.

Description

Image matching method and device

Technical field

The present invention relates to field of computer technology, in particular to a kind of image matching method and device.

Background technique

Images match identifies same or similar silhouette target in two width or several pictures by certain matching algorithm Method, have a very wide range of applications in fields such as computer vision, pattern-recognitions, such as recognition of face, three-dimensional reconstruction, object Identification, motion tracking and panorama sketch creation etc..

By taking recognition of face as an example, the general step of traditional image matching method is: firstly, determining face figure to be identified Then picture and benchmark face image select a kind of Feature Descriptor, the characteristic point of two images is calculated using Feature Descriptor, Then, the characteristic point for comparing two images determines similar characteristic point between two images using arest neighbors method, by every two Similar characteristic point is determined as matching pair, finally utilizes certain geological information constraint (epipolar geom etry, affine transformation, single strain Change) removal mistake matching pair, obtain correctly match pair, finally calculate correctly matching pair matching degree, merge these Matching degree obtains final matching degree, so that it may determine the matching degree of facial image to be identified and benchmark face image, then The location information of the correct matching pair of analysis, determines the position in matched region in benchmark face image and facial image to be identified Confidence breath.

In the implementation of the present invention, inventor find the relevant technologies the prior art has at least the following problems:

Based on above-mentioned image matching method, the process of characteristic point is calculated using Feature Descriptor, calculation amount is larger, it is time-consuming compared with Long, leading to the efficiency of images match reduces.

Summary of the invention

In order to solve problems in the prior art, the embodiment of the invention provides a kind of image matching method and devices.It is described Technical solution is as follows:

According to a first aspect of the embodiments of the present invention, a kind of image matching method is provided, which comprises

Obtain images to be recognized and benchmark image；

Model is extracted based on fisrt feature, in the benchmark image, extracts fisrt feature image, and be based on second feature Model is extracted, in the images to be recognized, the characteristic image and each candidate object for extracting multiple candidate objects are described Location information in images to be recognized；

In the characteristic image of the multiple candidate object, the determining and fisrt feature images match degree highest second Characteristic image；

By location information of the corresponding candidate object of the second feature image in the images to be recognized, it is determined as institute State the location information in the region to match in images to be recognized with the benchmark image, and by the second feature image with it is described The matching degree of fisrt feature image is determined as the matching degree of image in the benchmark image and the region.

Optionally, it includes the first convolutional layer that the fisrt feature, which extracts model,；

It is described that model is extracted based on fisrt feature, in the benchmark image, extract fisrt feature image, comprising:

The benchmark image is inputted into first convolutional layer, obtains fisrt feature image.

Optionally, it includes the second convolutional layer, region of interest ROI pond layer and candidate regions that the second feature, which extracts model, Domain generation module RPN；

It is described that model is extracted based on second feature, in the images to be recognized, extract the characteristic pattern of multiple candidate objects The location information of picture and each candidate object in the images to be recognized, comprising:

The images to be recognized is inputted into second convolutional layer, obtains third feature image；

The third feature image is inputted into the RPN, obtains position of multiple candidate objects in the images to be recognized Confidence breath；

Location information and the third feature image of the multiple candidate object in the images to be recognized is defeated Enter the pond ROI layer, obtains the characteristic image of the multiple candidate object.

Optionally, before the acquisition images to be recognized and benchmark image, further includes:

Obtain multiple training samples, wherein each training sample includes sample images to be recognized, sample benchmark image, more A sample position information and sample matches degree, wherein the multiple sample position information is multiple sample candidate objects described Location information in sample images to be recognized, the sample matches degree are the sample benchmark image and sample images to be recognized phase The matching degree of image in matched region；

It is inputted using the sample images to be recognized and the sample benchmark image as training, the multiple sample bit confidence Breath and the sample matches degree as output reference value, to initial first convolutional layer, initial second convolutional layer and Initial R PN into Row training, first convolutional layer, second convolutional layer and the RPN after being trained.

Optionally, described to be inputted using the sample images to be recognized and the sample benchmark image as training, it is described more A sample position information and the sample matches degree are as output reference value, to initial first convolutional layer, initial second convolutional layer And Initial R PN is trained, first convolutional layer, second convolutional layer and the RPN after being trained, packet It includes:

It is inputted using the sample images to be recognized and the sample benchmark image as training, the multiple sample bit confidence Breath and the sample matches degree are as output reference value, based on comparison loss function, to initial first convolutional layer, initial volume Two Lamination is trained, and based on detection frame regression function Smooth L1 Loss and softmax loss, is instructed to Initial R PN Practice, first convolutional layer, second convolutional layer and the RPN after being trained.

According to a second aspect of the embodiments of the present invention, a kind of image matching apparatus is provided, described device includes:

First obtains module, for obtaining images to be recognized and benchmark image；

Extraction module in the benchmark image, extracts fisrt feature image for extracting model based on fisrt feature, And model is extracted based on second feature, in the images to be recognized, extract the characteristic image of multiple candidate objects and each Location information of the candidate object in the images to be recognized；

First determining module, for determining and the fisrt feature figure in the characteristic image of the multiple candidate object As the highest second feature image of matching degree；

Second determining module, for by the corresponding candidate object of the second feature image in the images to be recognized Location information is determined as the location information in the region to match in the images to be recognized with the benchmark image, and will be described The matching degree of second feature image and the fisrt feature image is determined as of image in the benchmark image and the region With degree.

The extraction module is used for:

Location information and the third feature image of the multiple candidate object in the images to be recognized is defeated Enter the pond ROI layer, the characteristic image and each candidate object for obtaining multiple candidate objects are in the images to be recognized Location information.

Optionally, described device further include:

Second obtains module, for obtaining multiple training samples before obtaining images to be recognized and benchmark image, In, each training sample includes sample images to be recognized, sample benchmark image, multiple sample position information and sample matches degree, Wherein, the multiple sample position information is location information of multiple sample candidate objects in the sample images to be recognized, The sample matches degree is the matching degree of the image in the region that the sample benchmark image matches with sample images to be recognized；

Training module, it is described for being inputted using the sample images to be recognized and the sample benchmark image as training Multiple sample position information and the sample matches degree are as output reference value, to initial first convolutional layer, initial second convolution Layer and Initial R PN are trained, first convolutional layer, second convolutional layer and the RPN after being trained.

Optionally, the training module is used for:

According to a third aspect of the embodiments of the present invention, a kind of electronic equipment is provided, the electronic equipment include processor and Memory, is stored at least one instruction, at least a Duan Chengxu, code set or instruction set in the memory, and described at least one Item instruction, an at least Duan Chengxu, the code set or instruction set are loaded by the processor and are executed to realize such as first Image matching method described in aspect.

According to a fourth aspect of the embodiments of the present invention, a kind of computer readable storage medium is provided, in the storage medium It is stored at least one instruction, at least a Duan Chengxu, code set or instruction set, described at least one instructs, is at least one section described Program, the code set or instruction set are loaded as the processor and are executed to realize the images match as described in first aspect Method.

Technical solution provided in an embodiment of the present invention has the benefit that

In the embodiment of the present invention, images to be recognized and benchmark image are obtained；Model is extracted based on fisrt feature, in the base In quasi- image, fisrt feature image is extracted, and model is extracted based on second feature, in the images to be recognized, extracted multiple The location information of the characteristic image of candidate object and each candidate object in the images to be recognized；In the multiple candidate In the characteristic image of object, the determining and highest second feature image of the fisrt feature images match degree；It is special by described second Levy location information of the corresponding candidate object of image in the images to be recognized, be determined as in the images to be recognized with it is described The location information in the region that benchmark image matches, and by the matching of the second feature image and the fisrt feature image Degree is determined as the matching degree of image in the benchmark image and the region.In this way, carrying out image using trained model Match, the calculation amount than calculating characteristic point using Feature Descriptor is small, and the time of consuming is few, therefore, can make the effect of images match Rate improves.

It should be understood that above general description and following detailed description be only it is exemplary and explanatory, not It can the limitation present invention.

Detailed description of the invention

To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment Attached drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for For those of ordinary skill in the art, without creative efforts, it can also be obtained according to these attached drawings other Attached drawing.

Fig. 1 is a kind of flow chart of image matching method provided in an embodiment of the present invention；

Fig. 2 is a kind of flow diagram of image matching method provided in an embodiment of the present invention；

Fig. 3 is a kind of schematic diagram of a scenario of image matching method provided in an embodiment of the present invention；

Fig. 4 is a kind of structural schematic diagram of image matching apparatus provided in an embodiment of the present invention；

Fig. 5 is a kind of structural schematic diagram of image matching apparatus provided in an embodiment of the present invention；

Fig. 6 is a kind of structural block diagram of terminal provided in an embodiment of the present invention；

Fig. 7 is a kind of structural schematic diagram of server provided in an embodiment of the present invention.

Through the above attached drawings, it has been shown that the specific embodiment of the present invention will be hereinafter described in more detail.These attached drawings It is not intended to limit the scope of the inventive concept in any manner with verbal description, but is by referring to specific embodiments Those skilled in the art illustrate idea of the invention.

Specific embodiment

To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with attached drawing to embodiment party of the present invention Formula is described in further detail.

The embodiment of the invention provides a kind of image matching method, this method can be realized by server or terminal.

Server may include the components such as processor, memory, transceiver.Processor can be CPU (Central Processing Unit, central processing unit) etc., can be used for extracting characteristic image, determine in images to be recognized with reference map As the location information in the region to match, calculating matching degree, more multiple matching degrees determine that maximum value therein etc. is handled.Storage Device can be RAM (Random Access Memory, random access memory) that Flash (flash memory) etc. can be used for storing Data needed for the data that receive, treatment process, the data generated in treatment process etc., as images to be recognized, benchmark image, Fisrt feature image, location information in images to be recognized of characteristic image, each candidate object of multiple candidate objects etc..It receives Device is sent out, can be used for carrying out data transmission with terminal or other servers, for example, images to be recognized and benchmark image etc. are received, Transceiver may include antenna, match circuit, modem etc..

Terminal may include the components such as processor, memory.Processor can be CPU (Central Processing Unit, central processing unit) etc., it can be used for extracting characteristic image, determine and match with benchmark image in images to be recognized The location information in region, calculating matching degree, more multiple matching degrees determine that maximum value therein etc. is handled.Memory, Ke Yiwei RAM (Random Access Memory, random access memory), Flash (flash memory) etc. can be used for storing the number received According to the data etc. generated in data, treatment process needed for, treatment process, such as images to be recognized, benchmark image, fisrt feature figure As the location information etc. in images to be recognized of characteristic image, each candidate object of, multiple candidate objects.Terminal can also wrap Include transceiver, screen, image pickup section, audio output part and audio input means etc..Transceiver, can be used for it is other Equipment carries out data transmission, and may include antenna, match circuit, modulatedemodulate for example, receiving images to be recognized and benchmark image etc. Adjust device etc..Screen is displayed for matching degree and location information etc..Image pickup section can be video camera etc..Audio is defeated Component can be speaker, earphone etc. out.Audio input means can be microphone etc..

As shown in Figure 1, the process flow of this method may include following step:

In a step 101, images to be recognized and benchmark image are obtained.

In an alternative embodiment, by taking recognition of face as an example, recognition of face needs to judge facial image and benchmark to be identified Whether facial image matches, if it does, then being verified, if it does not match, verifying does not pass through.

When needing to carry out images match, electronic equipment obtains images to be recognized and benchmark image, the images to be recognized and Benchmark image can be that electronic equipment is pre-stored, be also possible to temporarily to input, the present invention is it is not limited here.

Optionally, it before obtaining images to be recognized and benchmark image, needs to extract the first initial characteristics ready for use Model and the second initial characteristics extract model and are trained, and corresponding treatment process, which can be such that, obtains multiple training samples, In, each training sample includes sample images to be recognized, sample benchmark image, multiple sample position information and sample matches degree, Wherein, multiple sample position information are location information of multiple sample candidate objects in sample images to be recognized, sample matches Degree is the matching degree of the image in sample benchmark image and region；Using sample images to be recognized and sample benchmark image as training Input, multiple sample position information and sample matches degree are as output reference value, to initial first convolutional layer, initial second convolution Layer and initial candidate Area generation module RPN (Region Proposal Network, candidate region network) are trained, The first convolutional layer, the second convolutional layer and RPN after being trained.

In an alternative embodiment, in training the first convolutional layer ready for use, the second convolutional layer, ROI (Region of Interest, area-of-interest) pond layer and when RPN, firstly, electronic equipment obtains multiple training samples, each trained sample This includes sample images to be recognized, sample benchmark image, multiple sample position information and sample matches degree, wherein sample matches The value of degree can be 0 or 1, should when there is region identical with sample benchmark image in the sample images to be recognized Sample matches degree value is 1；It, should when region identical with sample benchmark image is not present in the sample images to be recognized Sample matches degree value is 0.Then, it is inputted using sample images to be recognized and sample benchmark image as training, the knot that will be obtained Fruit compares with multiple sample position information and sample matches degree, to initial first convolutional layer, initial second convolutional layer and Initial R PN is trained, the first convolutional layer, the second convolutional layer, pond layer and RPN after being trained；Wherein, the first convolution Layer is shared with the second convolutional layer weight.

Optionally, in the training process, different layers can be trained by different functions, is had reached preferably Training effect, the detailed process of above-mentioned steps can be such that sample images to be recognized and sample benchmark image is defeated as training Enter, multiple sample position information and sample matches degree are as output reference value, based on comparison loss function Contrastive Loss is trained initial first convolutional layer, initial second convolutional layer, based on detection frame regression function Smooth L1 Loss is trained Initial R PN, the first convolutional layer, the second convolutional layer and RPN after being trained.

In an alternative embodiment, sample benchmark image is input to initial first convolutional layer, sample images to be recognized is defeated Enter to initial second convolutional layer, the result that feature extraction obtains is inputted into the pond RPN and ROI layer respectively, may finally be obtained Prediction and matching degree and multiple predicted position information.

Sample matches degree in prediction and matching degree and training sample is brought into Contrastive Loss, loss is calculated Value, is trained initial first convolutional layer and initial second convolutional layer according to obtained penalty values.Contrastive The optimization aim of Loss is to keep the characteristic distance of same characteristic features closer, and the characteristic distance of different characteristic is farther, and here is The formula of Contrastive Loss:

Wherein, L indicates error, and N indicates the number of training sample, and y indicates that sample matches degree, d indicate prediction and matching degree, Margin represents preset threshold.

After obtaining penalty values by Contrastive Loss, initial first convolutional layer and initial volume Two are constantly adjusted Each weight in lamination, so that obtained penalty values constantly reduce, the certain the number of iterations of training determines this until model convergence Each weight in the first convolutional layer and the second convolutional layer carved, respectively trained first convolutional layer and the second convolutional layer In each weight.

All areas in images to be recognized are input in the softmax classifier in RPN, softmax classifier can To filter out background image from multiple candidate images, i.e., there is no the image of actual object, and the background that will be screened in image Image removal, the image for having actual object obtained after output screening, as candidate image.Then, it is based on Smooth L1 Loss and softmax loss is trained Initial R PN, the RPN after being trained.Wherein, Smooth L1 Loss's is excellent Changing target is that candidate frame is allow more accurately to navigate to object, the optimization aim of Softmax be in determining provincial characteristics whether Comprising object,

In a step 102, model is extracted based on fisrt feature, in benchmark image, extracts fisrt feature image, and be based on Second feature extracts model, and in images to be recognized, the characteristic image and each candidate object for extracting multiple candidate objects exist Location information in images to be recognized.

In an alternative embodiment, after obtaining images to be recognized and benchmark image through the above steps, electronic equipment is by benchmark Image inputs fisrt feature and extracts model, and by feature extraction, the corresponding fisrt feature image of available benchmark image (can claim Make reference characteristic image).

Meanwhile images to be recognized input second feature is extracted into model, the feature extraction of model is extracted by second feature, Location information of the characteristic image and each candidate object of available multiple candidate objects in images to be recognized, the position Confidence breath can be indicated in the form of coordinate.

Optionally, the fisrt feature in above-mentioned steps 102 extracts model, may include the first convolutional layer, therefore, above-mentioned step Rapid 102 treatment process, which may is that, inputs the first convolutional layer for benchmark image, obtains fisrt feature image.

In an alternative embodiment, as shown in Fig. 2, the fisrt feature in above-mentioned steps, which is extracted, includes at least a volume in model Lamination (as the first convolutional layer), benchmark image is input in first convolutional layer by electronic equipment, by the first convolutional layer Process of convolution, the corresponding fisrt feature image (can be referred to as reference characteristic image) of available benchmark image.

Optionally, it includes the second convolutional layer, the pond ROI layer and RPN that the second feature of above-mentioned steps 102, which extracts model,；Cause This, the treatment process of above-mentioned steps 102, which may is that, inputs the second convolutional layer for images to be recognized, obtains third feature image；It will Third feature image inputs RPN, obtains location information of multiple candidate objects in images to be recognized；Multiple candidate objects are existed Location information and third feature image in images to be recognized input the pond ROI layer, obtain the characteristic pattern of multiple candidate objects Picture.

In an alternative embodiment, as shown in Fig. 2, it may include convolutional layer that the second feature in above-mentioned steps, which is extracted in model, (i.e. the second convolutional layer), the pond ROI layer and RPN.

Images to be recognized is input in the second convolutional layer by electronic equipment, can be with by the process of convolution of the second convolutional layer Obtain third feature image (can be referred to as characteristic image to be identified).Characteristic image to be identified is input in RPN, RPN can be In characteristic image to be identified, the region where candidate object is recognized, and then obtains the rectangular area where multiple candidate objects Location information in images to be recognized, the location information can be indicated in the form of coordinate, such as can use (x, y, W, h) indicate the location information of rectangular area where each candidate object, wherein and x indicates the rectangle where candidate's object The abscissa of the top left corner apex in region, y indicate the ordinate of the top left corner apex of the rectangular area where candidate's object, w For the width of the rectangular area, h is the length of the rectangular area.

Then, location information obtained above and characteristic image to be identified are inputted into ROI (Region of Interest, area-of-interest) pond layer, after the processing of the pondization of the pond ROI layer, available multiple candidate objects, The identical characteristic image of dimension size.

Meanwhile by the direct output model of location information obtained above, multiple location information according to put in order with Putting in order for the characteristic image of multiple candidate's objects is identical, so that the subsequent characteristic image according to candidate object determines this feature The corresponding location information of image.

It should be noted that the ruler of the characteristic image of reference characteristic image obtained in above-mentioned steps and multiple candidate objects Very little size is all the same.

In step 103, in the characteristic image of multiple candidate objects, determination is highest with fisrt feature images match degree Second feature image.

In an alternative embodiment, through the above steps, obtain reference characteristic image, multiple candidate objects characteristic image with And after location information of each candidate object in images to be recognized, electronic equipment calculating benchmark characteristic image and each candidate The matching degree of the characteristic image of body obtains multiple matching degrees, this multiple matching degree is compared, and determines wherein highest matching Degree, and determine the characteristic image of the corresponding candidate object of the matching degree, this feature image, which is determined as second feature image, (can claim Make most matching characteristic image).

At step 104, the location information by the corresponding candidate object of second feature image in images to be recognized, determines Location information for the region to match in images to be recognized with benchmark image, and by second feature image and fisrt feature image Matching degree, be determined as the matching degree of image in benchmark image and region.

In an alternative embodiment, after determining most matching characteristic image through the above steps, by of most matching characteristic image Be determined as the matching degree of benchmark image and images to be recognized with degree, and according to most matching characteristic image multiple candidate objects spy Putting in order in sign image, determines the most corresponding location information of matching characteristic image, which is determined as wait know The location information in the region that other image and benchmark image match.

Benchmark image is obtained with after the matching degree of image in region, subsequent processing can be carried out, such as by the matching degree and in advance If matching degree threshold value compare, and different processing is carried out according to comparing result.For example, police can supervised with recognition of face Suspect is found in control image, in this process, obtains benchmark face image and people to be identified through the above steps The matching degree of face image is the location information in 0.87 and the corresponding region of the matching degree, can be with selecting the modes such as frame to mark the area Domain, as shown in Figure 3.And the pre-set matching degree threshold value of technical staff is 0.70, the matching degree obtained due to images match 0.87 is greater than preset matching degree threshold value 0.70, thus may determine that the people for selecting circle to go out is suspect.

Based on the same technical idea, the embodiment of the invention also provides a kind of image matching apparatus, which can be Electronic equipment in above-described embodiment, as shown in figure 4, the device includes: the first acquisition module 410, extraction module 420, first Determining module 430 and the second determining module 440.

The first acquisition module 410, is configured as obtaining images to be recognized and benchmark image；

The extraction module 420 is configured as extracting model based on fisrt feature, in the benchmark image, extracts first Characteristic image, and model is extracted based on second feature, in the images to be recognized, extract the characteristic image of multiple candidate objects And location information of each candidate object in the images to be recognized；

First determining module 430 is configured as in the characteristic image of the multiple candidate object, determining with described the The highest second feature image of one characteristic image matching degree；

Second determining module 440 is configured as the corresponding candidate object of the second feature image described wait know Location information in other image is determined as the position letter in the region to match in the images to be recognized with the benchmark image Breath, and by the matching degree of the second feature image and the fisrt feature image, it is determined as the benchmark image and the area The matching degree of image in domain.

The extraction module 420 is configured as:

Optionally, as shown in figure 5, described device further include:

Second obtains module 510, is configured as before obtaining images to be recognized and benchmark image, obtains multiple trained samples This, wherein each training sample includes sample images to be recognized, sample benchmark image, multiple sample position information and sample With degree, wherein the multiple sample position information is position of multiple sample candidate objects in the sample images to be recognized Information, the sample matches degree are of the image in the region that the sample benchmark image matches with sample images to be recognized With degree；

Training module 520 is configured as the sample images to be recognized and the sample benchmark image is defeated as training Enter, the multiple sample position information and the sample matches degree are as output reference value, to initial first convolutional layer, initial the Two convolutional layers, Initial R OI pond layer and Initial R PN are trained, first convolutional layer after being trained, described second Convolutional layer, the pond ROI layer and the RPN.

Optionally, the training module 520 is configured as:

It is inputted using the sample images to be recognized and the sample benchmark image as training, the multiple sample bit confidence Breath and the sample matches degree are as output reference value, based on comparison loss function, to initial first convolutional layer, initial volume Two Lamination, the pond Initial R OI layer are trained, and are based on detection frame regression function Smooth L1 Loss and softmax loss, Initial R PN is trained, first convolutional layer, second convolutional layer, the pond ROI layer after being trained and The RPN.

It should be understood that image matching method provided by the above embodiment is in images match, only with above-mentioned each function The division progress of module can according to need and for example, in practical application by above-mentioned function distribution by different function moulds Block is completed, i.e., the internal structure of terminal is divided into different functional modules, to complete all or part of function described above Energy.In addition, image matching apparatus provided by the above embodiment and image matching method embodiment belong to same design, it is specific real Existing process is detailed in embodiment of the method, and which is not described herein again.

Fig. 6 is the structural block diagram of terminal provided in an embodiment of the present invention.The terminal 600 can be portable mobile termianl, Such as: smart phone, tablet computer.Terminal 600 is also possible to referred to as other titles such as user equipment, portable terminal.

In general, terminal 600 includes: processor 601 and memory 602.

Processor 601 may include one or more processing cores, such as 4 core processors, 8 core processors etc..Place Reason device 601 can use DSP (Digital Signal Processing, Digital Signal Processing), FPGA (Field- Programmable Gate Array, field programmable gate array), PLA (Programmable Logic Array, may be programmed Logic array) at least one of example, in hardware realize.Processor 601 also may include primary processor and coprocessor, master Processor is the processor for being handled data in the awake state, also referred to as CPU (Central Processing Unit, central processing unit)；Coprocessor is the low power processor for being handled data in the standby state.? In some embodiments, processor 601 can be integrated with GPU (Graphics Processing Unit, image processor), GPU is used to be responsible for the rendering and drafting of content to be shown needed for display screen.In some embodiments, processor 601 can also be wrapped AI (Artificial Intelligence, artificial intelligence) processor is included, the AI processor is for handling related machine learning Calculating operation.

Memory 602 may include one or more computer readable storage mediums, which can To be tangible and non-transient.Memory 602 may also include high-speed random access memory and nonvolatile memory, Such as one or more disk storage equipments, flash memory device.In some embodiments, non-transient in memory 602 Computer readable storage medium for storing at least one instruction, at least one instruction for performed by processor 601 with Realize image matching method provided herein.

In some embodiments, terminal 600 is also optional includes: peripheral device interface 603 and at least one peripheral equipment. Specifically, peripheral equipment includes: radio circuit 604, touch display screen 605, camera 606, voicefrequency circuit 607, positioning component At least one of 608 and power supply 609.

Peripheral device interface 603 can be used for I/O (Input/Output, input/output) is relevant outside at least one Peripheral equipment is connected to processor 601 and memory 602.In some embodiments, processor 601, memory 602 and peripheral equipment Interface 603 is integrated on same chip or circuit board；In some other embodiments, processor 601, memory 602 and outer Any one or two in peripheral equipment interface 603 can realize on individual chip or circuit board, the present embodiment to this not It is limited.

Radio circuit 604 is for receiving and emitting RF (Radio Frequency, radio frequency) signal, also referred to as electromagnetic signal.It penetrates Frequency circuit 604 is communicated by electromagnetic signal with communication network and other communication equipments.Radio circuit 604 turns electric signal It is changed to electromagnetic signal to be sent, alternatively, the electromagnetic signal received is converted to electric signal.Optionally, radio circuit 604 wraps It includes: antenna system, RF transceiver, one or more amplifiers, tuner, oscillator, digital signal processor, codec chip Group, user identity module card etc..Radio circuit 604 can be carried out by least one wireless communication protocol with other terminals Communication.The wireless communication protocol includes but is not limited to: WWW, Metropolitan Area Network (MAN), Intranet, each third generation mobile communication network (2G, 3G, 4G and 5G), WLAN and/or WiFi (Wireless Fidelity, Wireless Fidelity) network.In some embodiments, it penetrates Frequency circuit 604 can also include NFC (Near Field Communication, wireless near field communication) related circuit, this Application is not limited this.

Touch display screen 605 is for showing UI (User Interface, user interface).The UI may include figure, text Sheet, icon, video and its their any combination.Touch display screen 605 also have acquisition touch display screen 605 surface or The ability of the touch signal of surface.The touch signal can be used as control signal and be input to processor 601 and be handled.Touching Display screen 605 is touched for providing virtual push button and/or dummy keyboard, also referred to as soft button and/or soft keyboard.In some embodiments In, touch display screen 605 can be one, and the front panel of terminal 600 is arranged；In further embodiments, touch display screen 605 It can be at least two, be separately positioned on the different surfaces of terminal 600 or in foldover design；In still other embodiments, touch Display screen 605 can be flexible display screen, be arranged on the curved surface of terminal 600 or on fold plane.Even, touch display screen 605 can also be arranged to non-rectangle irregular figure, namely abnormity screen.Touch display screen 605 can use LCD (Liquid Crystal Display, liquid crystal display), OLED (Organic Light-Emitting Diode, Organic Light Emitting Diode) Etc. materials preparation.

CCD camera assembly 606 is for acquiring image or video.Optionally, CCD camera assembly 606 include front camera and Rear camera.In general, front camera is for realizing video calling or self-timer, rear camera is for realizing photo or video Shooting.In some embodiments, rear camera at least two are main camera, depth of field camera, wide-angle imaging respectively Any one in head, to realize that main camera and the fusion of depth of field camera realize background blurring function, main camera and wide-angle Pan-shot and VR (Virtual Reality, virtual reality) shooting function are realized in camera fusion.In some embodiments In, CCD camera assembly 606 can also include flash lamp.Flash lamp can be monochromatic warm flash lamp, be also possible to double-colored temperature flash of light Lamp.Double-colored temperature flash lamp refers to the combination of warm light flash lamp and cold light flash lamp, can be used for the light compensation under different-colour.

Voicefrequency circuit 607 is used to provide the audio interface between user and terminal 600.Voicefrequency circuit 607 may include wheat Gram wind and loudspeaker.Microphone is used to acquire the sound wave of user and environment, and converts sound waves into electric signal and be input to processor 601 are handled, or are input to radio circuit 604 to realize voice communication.For stereo acquisition or the purpose of noise reduction, wheat Gram wind can be it is multiple, be separately positioned on the different parts of terminal 600.Microphone can also be array microphone or omnidirectional's acquisition Type microphone.Loudspeaker is then used to that sound wave will to be converted to from the electric signal of processor 601 or radio circuit 604.Loudspeaker can To be traditional wafer speaker, it is also possible to piezoelectric ceramic loudspeaker.When loudspeaker is piezoelectric ceramic loudspeaker, not only may be used To convert electrical signals to the audible sound wave of the mankind, the sound wave that the mankind do not hear can also be converted electrical signals to survey Away from etc. purposes.In some embodiments, voicefrequency circuit 607 can also include earphone jack.

Positioning component 608 is used for the current geographic position of positioning terminal 600, to realize navigation or LBS (Location Based Service, location based service).Positioning component 608 can be the GPS (Global based on the U.S. Positioning System, global positioning system), China dipper system or Russia Galileo system positioning group Part.

Power supply 609 is used to be powered for the various components in terminal 600.Power supply 609 can be alternating current, direct current, Disposable battery or rechargeable battery.When power supply 609 includes rechargeable battery, which can be wired charging electricity Pond or wireless charging battery.Wired charging battery is the battery to be charged by Wireline, and wireless charging battery is by wireless The battery of coil charges.The rechargeable battery can be also used for supporting fast charge technology.

In some embodiments, terminal 600 further includes having one or more sensors 610.The one or more sensors 610 include but is not limited to: acceleration transducer 611, gyro sensor 612, pressure sensor 613, fingerprint sensor 614, Optical sensor 615 and proximity sensor 616.

The acceleration that acceleration transducer 611 can detecte in three reference axis of the coordinate system established with terminal 600 is big It is small.For example, acceleration transducer 611 can be used for detecting component of the acceleration of gravity in three reference axis.Processor 601 can With the acceleration of gravity signal acquired according to acceleration transducer 611, touch display screen 605 is controlled with transverse views or longitudinal view Figure carries out the display of user interface.Acceleration transducer 611 can be also used for the acquisition of game or the exercise data of user.

Gyro sensor 612 can detecte body direction and the rotational angle of terminal 600, and gyro sensor 612 can To cooperate with acquisition user to act the 3D of terminal 600 with acceleration transducer 611.Processor 601 is according to gyro sensor 612 Following function may be implemented in the data of acquisition: when action induction (for example changing UI according to the tilt operation of user), shooting Image stabilization, game control and inertial navigation.

The lower layer of side frame and/or touch display screen 605 in terminal 600 can be set in pressure sensor 613.Work as pressure When the side frame of terminal 600 is arranged in sensor 613, it can detecte user to the gripping signal of terminal 600, believed according to the gripping Number carry out right-hand man's identification or prompt operation.When the lower layer of touch display screen 605 is arranged in pressure sensor 613, Ke Yigen According to user to the pressure operation of touch display screen 605, realization controls the operability control on the interface UI.Operability Control includes at least one of button control, scroll bar control, icon control, menu control.

Fingerprint sensor 614 is used to acquire the fingerprint of user, according to the identity of collected fingerprint recognition user.Knowing Not Chu the identity of user when being trusted identity, authorize the user to execute relevant sensitive operation, the sensitive operation by processor 601 Including solution lock screen, check encryption information, downloading software, payment and change setting etc..End can be set in fingerprint sensor 614 Front, the back side or the side at end 600.When being provided with physical button or manufacturer Logo in terminal 600, fingerprint sensor 614 can To be integrated with physical button or manufacturer Logo.

Optical sensor 615 is for acquiring ambient light intensity.In one embodiment, processor 601 can be according to optics The ambient light intensity that sensor 615 acquires controls the display brightness of touch display screen 605.Specifically, when ambient light intensity is higher When, the display brightness of touch display screen 605 is turned up；When ambient light intensity is lower, the display for turning down touch display screen 605 is bright Degree.In another embodiment, the ambient light intensity that processor 601 can also be acquired according to optical sensor 615, dynamic adjust The acquisition parameters of CCD camera assembly 606.

Proximity sensor 616, also referred to as range sensor are generally arranged at the front of terminal 600.Proximity sensor 616 is used In the distance between the front of acquisition user and terminal 600.In one embodiment, when proximity sensor 616 detects user When the distance between front of terminal 600 gradually becomes smaller, touch display screen 605 is controlled by processor 601 and is cut from bright screen state It is changed to breath screen state；When proximity sensor 616 detects user and the distance between the front of terminal 600 becomes larger, by Processor 601 controls touch display screen 605 and is switched to bright screen state from breath screen state.

It will be understood by those skilled in the art that the restriction of structure shown in Fig. 6 not structure paired terminal 600, can wrap It includes than illustrating more or fewer components, perhaps combine certain components or is arranged using different components.

Fig. 7 is the structural schematic diagram of server provided in an embodiment of the present invention.The server 700 can because configuration or performance not Bigger difference is generated together, may include one or more central processing units (central processing Units, CPU) 722 (for example, one or more processors) and memory 732, one or more storages apply journey The storage medium 730 (such as one or more mass memory units) of sequence 742 or data 744.Wherein, 732 He of memory Storage medium 730 can be of short duration storage or persistent storage.The program for being stored in storage medium 730 may include one or one With upper module (diagram does not mark), each module may include to the series of instructions operation in server.Further, in Central processor 722 can be set to communicate with storage medium 730, execute on server 700 a series of in storage medium 730 Instruction operation.

Server 700 can also include one or more power supplys 726, one or more wired or wireless networks Interface 750, one or more input/output interfaces 758, one or more keyboards 756, and/or, one or one The above operating system 741, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM etc..

Server 700 may include having perhaps one of them or one of more than one program of memory and one Procedure above is stored in memory, and be configured to be executed by one or more than one processor this or one with Upper program executes image matching method described in above-mentioned each embodiment.

Those of ordinary skill in the art will appreciate that realizing that all or part of the steps of above-described embodiment can pass through hardware It completes, relevant hardware can also be instructed to complete by program, the program can store in a kind of computer-readable In storage medium, storage medium mentioned above can be read-only memory, disk or CD etc..

The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all in spirit of the invention and Within principle, any modification, equivalent replacement, improvement and so on be should all be included in the protection scope of the present invention.

Claims

1. a kind of image matching method, which is characterized in that the described method includes:

Obtain images to be recognized and benchmark image；

Model is extracted based on fisrt feature, in the benchmark image, extracts fisrt feature image, and extract based on second feature Model, in the images to be recognized, the characteristic image and each candidate object of the multiple candidate objects of extraction are described wait know Location information in other image；

In the characteristic image of the multiple candidate object, the determining and highest second feature of fisrt feature images match degree Image；

By location information of the corresponding candidate object of the second feature image in the images to be recognized, be determined as it is described to The location information in region to match in identification image with the benchmark image, and by the second feature image and described first The matching degree of characteristic image is determined as the matching degree of image in the benchmark image and the region.

2. the method according to claim 1, wherein it includes the first convolutional layer that the fisrt feature, which extracts model,；

3. method according to claim 1 or 2, which is characterized in that it includes the second convolution that the second feature, which extracts model, Layer, region of interest ROI pond layer and candidate region generation module RPN；

It is described that model is extracted based on second feature, in the images to be recognized, extract the characteristic image of multiple candidate objects with And location information of each candidate object in the images to be recognized, comprising:

The third feature image is inputted into the RPN, obtains position letter of multiple candidate objects in the images to be recognized Breath；

Location information of the multiple candidate object in the images to be recognized and the third feature image are inputted into institute The pond ROI layer is stated, the characteristic image of the multiple candidate object is obtained.

4. according to the method described in claim 3, it is characterized in that, described input second convolution for the images to be recognized Layer, before obtaining third feature image, further includes:

Obtain multiple training samples, wherein each training sample includes sample images to be recognized, sample benchmark image, multiple samples This location information and sample matches degree, wherein the multiple sample position information is multiple sample candidate objects in the sample Location information in images to be recognized, the sample matches degree are that the sample benchmark image matches with sample images to be recognized Region in image matching degree；

Inputted the sample images to be recognized and the sample benchmark image as training, the multiple sample position information and The sample matches degree instructs initial first convolutional layer, initial second convolutional layer and Initial R PN as output reference value Practice, first convolutional layer, second convolutional layer and the RPN after being trained.

5. according to the method described in claim 4, it is characterized in that, described by the sample images to be recognized and the sample base Quasi- image is used as output reference value as training input, the multiple sample position information and the sample matches degree, to initial First convolutional layer, initial second convolutional layer and Initial R PN are trained, first convolutional layer after being trained, described Second convolutional layer and the RPN, comprising:

Inputted the sample images to be recognized and the sample benchmark image as training, the multiple sample position information and The sample matches degree is as output reference value, based on comparison loss function contrastive loss, to initial first convolution Layer, initial second convolutional layer are trained, and based on detection frame regression function SmoothL1Loss, are trained to Initial R PN, First convolutional layer, second convolutional layer and the RPN after being trained.

6. a kind of image matching apparatus, which is characterized in that described device includes:

Extraction module in the benchmark image, extracts fisrt feature image, and base for extracting model based on fisrt feature Model is extracted in second feature, in the images to be recognized, the characteristic image of the multiple candidate objects of extraction and each candidate Location information of the object in the images to be recognized；

First determining module, for determining and the fisrt feature image in the characteristic image of the multiple candidate object With the highest second feature image of degree；

Second determining module, for the position by the corresponding candidate object of the second feature image in the images to be recognized Information, is determined as the location information in the region to match in the images to be recognized with the benchmark image, and by described second The matching degree of characteristic image and the fisrt feature image is determined as the matching of image in the benchmark image and the region Degree.

7. device according to claim 6, which is characterized in that it includes the first convolutional layer that the fisrt feature, which extracts model,；

The extraction module, is used for:

8. device according to claim 6 or 7, which is characterized in that it includes the second convolution that the second feature, which extracts model, Layer, region of interest ROI pond layer and candidate region generation module RPN；

The extraction module is used for:

9. device according to claim 8, which is characterized in that described device further include:

Second obtains module, for obtaining multiple training samples before obtaining images to be recognized and benchmark image, wherein every A training sample includes sample images to be recognized, sample benchmark image, multiple sample position information and sample matches degree, wherein The multiple sample position information is location information of multiple sample candidate objects in the sample images to be recognized, the sample This matching degree is the matching degree of the image in the region that the sample benchmark image matches with sample images to be recognized；

Training module, it is the multiple for being inputted using the sample images to be recognized and the sample benchmark image as training Sample position information and the sample matches degree as output reference value, to initial first convolutional layer, initial second convolutional layer with And Initial R PN is trained, first convolutional layer, second convolutional layer and the RPN after being trained.

10. device according to claim 9, which is characterized in that the training module is used for:

Inputted the sample images to be recognized and the sample benchmark image as training, the multiple sample position information and The sample matches degree is as output reference value, based on comparison loss function, to initial first convolutional layer, initial second convolutional layer It is trained, based on detection frame regression function Smooth L1Loss and softmax loss, Initial R PN is trained, is obtained First convolutional layer, second convolutional layer and the RPN after to training.

11. a kind of electronic equipment, which is characterized in that the electronic equipment includes processor and memory, is deposited in the memory Contain at least one instruction, at least a Duan Chengxu, code set or instruction set, at least one instruction, an at least Duan Cheng Sequence, the code set or instruction set are loaded by the processor and are executed to realize image as claimed in claim 1 to 5 Matching process.

12. a kind of computer readable storage medium, which is characterized in that be stored at least one instruction, extremely in the storage medium A few Duan Chengxu, code set or instruction set, at least one instruction, an at least Duan Chengxu, the code set or instruction Collection is loaded by the processor and is executed to realize image matching method as claimed in claim 1 to 5.