CN109697464A

CN109697464A - Method and system based on the identification of the precision target of object detection and signature search

Info

Publication number: CN109697464A
Application number: CN201811540379.XA
Authority: CN
Inventors: 柯恒忠; 董志忠
Original assignee: Universal Wisdom Technology Beijing Co Ltd
Current assignee: Universal Wisdom Technology Beijing Co Ltd
Priority date: 2018-12-17
Filing date: 2018-12-17
Publication date: 2019-04-30

Abstract

The present invention provides a kind of method and system based on the identification of the precision target of object detection and signature search, wherein this method comprises: obtaining original input picture, obtains fisrt feature figure group by CNN convolutional neural networks；Resampling and concatenation are successively carried out to the fisrt feature figure group, obtain second feature figure group；The second feature figure group is mapped in the original input picture, the position detection of prediction block is carried out to the original input picture, obtains the prediction block of the original input picture；Judge whether the prediction block is target area, if so, obtaining the classification information in the target area；Extract the visual signature of the target area；The detailed classification information of visual signature and original input picture is matched.The method that the embodiment of the present invention is combined using object detection and signature search, realizes accurately identifying for target, can quickly add fresh target, be with good expansibility.

Description

Method and system based on the identification of the precision target of object detection and signature search

Technical field

The present invention relates to target detection identification technology fields, are searched in particular to one kind based on object detection and feature The method and system of the precision target identification of rope.

Background technique

In the prior art, the detection technique based on deep learning is used merely, can only identify the classification information of target, it can not Accurately identifying target is some specific target.If in order to identify that target, can only be by target point to some specific target It is more more careful that class is got, but arithmetic accuracy and speed can decreased significantly, and the data volume that target classification needs is huge, cost It is high.Addition to fresh target can only add classification, and the detection identification model for needing new data training new, expanded period It is long, scalability.

Summary of the invention

In view of this, the purpose of the present invention is to provide the sides of the precision target identification based on object detection and signature search Method and system accomplish quickly to add new target, be with good expansibility, it can be achieved that target accurately identifies,.

In a first aspect, the embodiment of the invention provides a kind of based on the identification of the precision target of object detection and signature search Method, wherein include:

Original input picture is obtained, obtains fisrt feature figure group by CNN convolutional neural networks；

Resampling and concatenation are successively carried out to the fisrt feature figure group, obtain second feature figure group；

The second feature figure group is mapped in the original input picture, the original input picture is predicted The position detection of frame obtains the prediction block of the original input picture；

Judge whether the prediction block is target area, if so, obtaining the classification information in the target area；

Extract the visual signature of the target area；

The detailed classification information of visual signature and original input picture is matched.

With reference to first aspect, the embodiment of the invention provides the first possible embodiments of first aspect, wherein institute Stating resampling includes up-sampling and down-sampling, successively carries out resampling and concatenation to the fisrt feature figure group, obtains the Two characteristic pattern groups include:

Down-sampling is carried out to the fisrt feature figure group, obtains third feature figure group；

Characteristic pattern except out to out characteristic pattern in the third feature figure group is up-sampled, fourth feature is obtained Figure group；

Characteristic pattern in the third feature figure group and the fourth feature figure group with same scale is spelled respectively It connects, obtains the second feature figure group.

With reference to first aspect, the embodiment of the invention provides second of possible embodiments of first aspect, wherein institute It states and the second feature figure group is mapped in the original input picture, prediction block position is carried out to the original input picture Detection, the prediction block for obtaining the original input picture include:

Scale based on second feature figure group grid division on the original input picture；

Each characteristic pattern in the second feature figure group is respectively mapped in the grid of the original input picture；

A prediction block is at least generated on the grid；

The correction position that the prediction block is calculated based on bounding box regression model obtains the correction value of the prediction block；

Correction value based on the prediction block and the prediction block, obtains the center of the prediction block of the original input picture Position.

With reference to first aspect, the embodiment of the invention provides the third possible embodiments of first aspect, wherein institute It states and judges whether the prediction block is that target area includes:

Destination probability is generated in the prediction block；Judge the destination probability whether less than the first preset threshold, if so, Then the prediction block is not the target area；

If it is not, then the prediction block is the target area.

With reference to first aspect, the embodiment of the invention provides the 4th kind of possible embodiments of first aspect, wherein institute It states if so, the classification information obtained in the target area includes:

Generate the class probability of the target；

The destination probability is multiplied with the class probability, formation condition class probability；

The condition class probability maximum value is chosen, it will classification information corresponding with the condition class probability maximum value It is input in database.

With reference to first aspect, the embodiment of the invention provides the 5th kind of possible embodiments of first aspect, wherein institute It states and extracts the visual signature of the object detection area and include:

It will determine that the original input picture behind target area is input to Visual Feature Retrieval Process model as query image In；

Extract the visual signature of the query image.

Second aspect is identified the embodiment of the invention also provides a kind of based on the precision target of object detection and signature search System, wherein include:

Module of target detection obtains original input picture, obtains fisrt feature figure group by CNN convolutional neural networks；It is right The fisrt feature figure group successively carries out resampling and concatenation, obtains second feature figure group；By the second feature figure group It is mapped in the original input picture, the position detection of prediction block is carried out to the original input picture, is obtained described original The prediction block of input picture；Judge whether the prediction block is target area, if so, obtaining the classification in the target area Information；

Target's feature-extraction module, for extracting the visual signature of the target area；

Target signature search module, for the detailed classification information of visual signature and original input picture to match.

In conjunction with second aspect, the embodiment of the invention provides the first possible embodiments of second aspect, wherein institute Stating resampling includes that up-sampling and down-sampling, the module of target detection are specifically used for:

Characteristic pattern in the third feature figure group and the fourth feature figure with same scale is spliced respectively, Obtain the second feature figure group.

In conjunction with second aspect, the embodiment of the invention provides second of possible embodiments of second aspect, wherein institute State module of target detection also particularly useful for:

A prediction block is at least generated on the grid；

In conjunction with second aspect, the embodiment of the invention provides the third possible embodiments of second aspect, wherein institute Target's feature-extraction module is stated to be specifically used for:

Extract the visual signature of the query image.

Method and system provided in an embodiment of the present invention based on the identification of the precision target of object detection and signature search, are adopted The method combined with object detection and signature search, with detection technique of the simple use based on deep learning in the prior art It compares, can not only identify the classification information of target, can also realize accurately identifying for target, can quickly add fresh target, have There is good scalability.

To enable the above objects, features and advantages of the present invention to be clearer and more comprehensible, preferred embodiment is cited below particularly, and cooperate Appended attached drawing, is described in detail below.

Detailed description of the invention

In order to illustrate the technical solution of the embodiments of the present invention more clearly, below will be to needed in the embodiment attached Figure is briefly described, it should be understood that the following drawings illustrates only certain embodiments of the present invention, therefore is not construed as pair The restriction of range for those of ordinary skill in the art without creative efforts, can also be according to this A little attached drawings obtain other relevant attached drawings.

Fig. 1 shows the side based on the identification of the precision target of object detection and signature search provided by the embodiment of the present invention The flow chart of method；

Fig. 2 shows C.ReLU structural schematic diagrams provided by the embodiment of the present invention；

Fig. 3 shows fireModle+Inception structural schematic diagram provided by the embodiment of the present invention；

Fig. 4 shows the flow chart of resampling provided by the embodiment of the present invention；

Fig. 5 shows the flow chart of the position detection of prediction block provided by the embodiment of the present invention；

Fig. 6 show judge provided by the embodiment of the present invention prediction block whether be target area flow chart；

Fig. 7 shows the flow chart that the classification information in target area is obtained provided by the embodiment of the present invention；

Fig. 8 shows the flow chart that the visual signature of target area is extracted provided by the embodiment of the present invention；

Fig. 9 shows the structural representation of the system based on object detection and signature search provided by the embodiment of the present invention Figure.

Main element symbol description: 1- module of target detection；2- target's feature-extraction module；3- target signature searches for mould Block.

Specific embodiment

In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention Middle attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is only It is a part of the embodiment of the present invention, instead of all the embodiments.The present invention being usually described and illustrated herein in the accompanying drawings is real The component for applying example can be arranged and be designed with a variety of different configurations.Therefore, of the invention to what is provided in the accompanying drawings below The detailed description of embodiment is not intended to limit the range of claimed invention, but is merely representative of selected reality of the invention Apply example.Based on the embodiment of the present invention, those skilled in the art institute obtained without making creative work There are other embodiments, shall fall within the protection scope of the present invention.

In view of being believed in the prior art using the classification that the detection technique based on deep learning can only recognize target merely Breath, be based on this, the embodiment of the invention provides it is a kind of based on the precision target of object detection and signature search know method for distinguishing and System is described below by embodiment.

Method for distinguishing is known based on the precision target of object detection and signature search the embodiment of the invention provides a kind of, is such as schemed Shown in 1, following steps S101~S106 is specifically included:

S101 obtains original input picture, obtains fisrt feature figure group by CNN convolutional neural networks.

Convolutional neural networks are composed in series by the sub-network that multiple C.ReLU and fireModule+Inception are combined, Character extraction is carried out to the original image of input, provides accurately target signature description for subsequent target detection.

As shown in Fig. 2, the calculation amount of neural network rises appreciably with the growth of detection classification number, detection speed is had Certain decline occurs in pairs using low layer convolution kernel in C.ReLU structure in order to raise speed and guarantee detection accuracy and parameter is mutual For the characteristic of opposite number, by simply connecting identical output and it being made to become double, in the number for reducing half output channel It under amount, can still reach the quantity exported originally, both improve 2 times of arithmetic speed, in turn ensure the precision of detection network.And And in order to provide the Nonlinear Modeling ability of neural network, activation primitive Relu joined in C.ReLU structure.

As shown in figure 3, including the convolutional layer that a convolution kernel size is 1 × 1 in fireModule structure, reach dimensionality reduction effect Fruit guarantees the characteristic pattern number being input in Inception structure so that the characteristic pattern quantity in input feature vector figure group greatly reduces Amount must be lacked as far as possible, reduce the calculation amount in Inception structure, therefore, fireModule structure plays the work of model compression With realization reduces model parameter and improves the purpose of speed.It is mainly of different sizes comprising three convolution kernels in Inception structure Convolutional layer, be 1 × 1 convolution kernel, 3 × 3 convolution kernels and two 3 × 3 convolution kernels respectively.In general, 1 can be chosen in network × 1 convolution kernel, 3 × 3 convolution kernels and 5 × 5 convolution kernels, but in order to guarantee to promote network under conditions of with same perceived open country Depth, and calculating parameter amount is reduced, 5 × 5 convolution kernels in embodiments of the present invention are by two 3 × 3 convolution nuclear subsitutions. The last layer in Inception structure is 1 × 1 convolution kernel, has not only remained one layer of receptive field, but also increase input pattern It is non-linear, slowed down it is some output features receptive fields growth, can accurately capture the target of small size.

Original input picture obtains output characteristic pattern group by multiple concatenated C.ReLU structures.Upper one layer of output is special Figure group is levied as fireModule layers of input, by the input feature vector figure group that is obtained after 1 × 1 convolution kernel dimensionality reduction respectively with three Different size of convolution nuclear convolution obtains the different output characteristic pattern group of three receptive fields, wherein uses 1 × 1 convolution nuclear convolution The output characteristic pattern group obtained afterwards corresponds to sufficiently small receptive field, can accurately detect small target；Use 3 × 3 convolution kernels And two 3 × 3 convolution kernels, the output characteristic pattern group obtained after convolution corresponds to big receptive field, to detect bigger mesh Mark.Then these three characteristic pattern groups are spliced, the output characteristic pattern group of splicing is passed through into 1 × 1 convolution kernel, reach different scale spy Levy the purpose of fusion.The sub-network series connection that multiple fireModule+Inception modules combine, mentions by the feature of multilayer After taking, a good feature can be just obtained, as fisrt feature figure group.

S102 successively carries out resampling and concatenation to fisrt feature figure group, obtains second feature figure group.

Optionally, resampling includes up-sampling and down-sampling, as shown in figure 4, specifically including following steps in step S102 S201~S203:

S201 carries out down-sampling to fisrt feature figure group, obtains third feature figure group.

Down-sampling is carried out to fisrt feature figure group, obtains the characteristic pattern group of 40 × 40 scales；Continue to 40 × 40 scales Characteristic pattern group carries out down-sampling, obtains the characteristic pattern group of 20 × 20 scales；Then the characteristic pattern group of 20 × 20 scales is carried out down Sampling, obtains the characteristic pattern group of 10 × 10 scales, collectively constitutes third feature figure group by the characteristic pattern group of three kinds of scales.

S202 up-samples the characteristic pattern except out to out characteristic pattern in third feature figure group, obtains the 4th spy Levy figure group.

The characteristic pattern group of 10 × 10 scales in third feature figure group is up-sampled, the characteristic pattern of 20 × 20 scales is obtained Group；Then the characteristic pattern group of 20 × 20 scales is up-sampled, obtains the characteristic pattern group of 40 × 40 scales；It is obtained by up-sampling The characteristic pattern of 20 × 20 scales and the characteristic pattern group of 40 × 40 scales collectively constitute fourth feature figure group.

S203 splices the characteristic pattern in third feature figure group and fourth feature figure group with same scale respectively, Obtain second feature figure group.

By 20 × 20 scales in the characteristic pattern group of 20 × 20 scales in third feature figure group and fourth feature figure group Characteristic pattern group is spliced, and the characteristic pattern group of 20 × 20 more scales of a characteristic pattern is obtained；By third feature figure group In the characteristic pattern group of 40 × 40 scales and the characteristic pattern group of 40 × 40 scales in fourth feature figure group spliced, obtain one The characteristic pattern group of 40 × 40 more scales of a characteristic pattern.Therefore, the characteristic pattern group of 10 × 10 scales, splicing obtain 20 The characteristic pattern group of × 20 scales and the characteristic pattern group of 40 × 40 scales together constitute multiple dimensioned second feature figure group.

Second feature figure group is mapped in original input picture by S103, and the position of prediction block is carried out to original input picture Detection is set, the prediction block of original input picture is obtained.

Optionally, as shown in figure 5, specifically including following steps S301~S305 in step S103:

S301, the grid division on original input picture of the scale based on second feature figure group.

Original input picture comes out image characteristics extraction by convolutional layer, with going deep into for network, extracted image Feature has more and more abstract semantic meaning, while the position meaning of these characteristics of image is but increasingly fuzzyyer, by will be special Each pixel in sign figure group is mapped in original input picture, embodies its position meaning.

Based on the different scale of second feature figure group, the grid division on original input picture.That is, 10 × 10 rulers The characteristic pattern group of degree can divide 10 × 10 grids on original input picture, and the characteristic pattern of 20 × 20 scales can be originally inputted 20 × 20 grids are divided on image, the characteristic pattern of 40 × 40 scales can divide 40 × 40 grids on original input picture, Detection that big target is carried out in the characteristic pattern group of 10 × 10 scales, in the characteristic pattern group and 40 × 40 scales of 20 × 20 scales Characteristic pattern group successively detects lesser target, meets that big target is few in this way and Small object can more actual conditions.

Each characteristic pattern in second feature figure group is respectively mapped in the grid of original input picture by S302.

By taking the characteristic pattern of 10 × 10 scales as an example, original input picture is divided into 10 × 10 grid, in characteristic pattern Each feature and the small center of a lattice point of each of grid correspond, each of characteristic pattern feature just describes In original input picture centered on corresponding to net region, it is greater than the feature in the region of the grid.

S303 at least generates a prediction block on grid.

K prediction block is generated on each grid by trained target detection model, each prediction block is one A preset 4 dimensional vector (x_i, y_j, w_k, h_k), wherein (x_i, y_j) indicate grid element center point coordinate (i, j), (w_k, h_k) respectively indicate The width and height of prediction block, and k=[0, K), indicate preset K prediction block.

S304 is calculated the correction position of prediction block based on bounding box regression model, obtains the correction value of prediction block.

The center of the not necessarily true frame in the center of prediction block is calculated pre- according to formula (1)~(4) Survey correction value (the δ x of frame_ijk, δ y_ijk, δ w_ijk, δ h_ijk), respectively indicate the centre coordinate (x of default frame_i, y_j) and prediction block width, Height (w_k, h_k) correction value:

S305, the correction value based on prediction block and prediction block, obtains the center of the true frame of original input picture.

By the calculating parameter between training correction value and preset value, by the target's center position of prediction block to true frame Center is approached, and the center of true frame is finally obtained, and the center of prediction block is essentially identical to true frame at this time Center.

S104 judges whether prediction block is target area, if so, obtaining the classification information in target area.

Optionally, as shown in fig. 6, specifically including following steps S401~S403 in step S104:

S401 generates destination probability in prediction block.

A numerical value is generated in prediction block, range indicates the probability for having target in prediction block in [0,1].

S402 judges destination probability whether less than the first preset threshold, if so, prediction block is not target area.

First preset threshold is set, if the destination probability of the prediction block is lower than threshold value, indicates that the prediction block is a nothing The target frame of effect, the prediction block are not the target areas of original input picture.

S403, if it is not, then prediction block is target area.

Optionally, as shown in fig. 7, specifically including following steps S501~S503 in step S403:

S501 generates the class probability of target.

Under the premise of prediction block is that frame is effectively predicted, pass through the deep learning neural network in target detection identification model Calculate the class probability that the target belongs to each classification.There are 1000 major class in the embodiment of the present invention, by being calculated 1000 A class probability.

S502, destination probability are multiplied with class probability, formation condition class probability.

Destination probability is multiplied with 1000 class probabilities respectively, obtains 1000 condition class probabilities.

S503 chooses condition class probability maximum value, will classification letter corresponding with the condition class probability maximum value Breath is input in database.

1000 condition class probabilities are arranged from high to low, choose condition class probability maximum value, it will be with condition classification The corresponding classification information of maximum probability value is input in database.

S105 extracts the visual signature of target area.

Optionally, as shown in figure 8, specifically including following steps S601~S602 in step S105:

S601 will determine that the original input picture behind target area is input to Visual Feature Retrieval Process model as query image In.

Based on CDVS (Compact Descriptors for VisualSearch, the compact description towards visual search Son) technology building Visual Feature Retrieval Process model, which, which can be realized, extracts the visual signature of target in Android and IOS, special It levies size and is less than 4KB.

S602 extracts the visual signature of query image.

The global characteristics and local feature of target area in original input picture are extracted simultaneously.Visual signature includes global special It seeks peace local feature, it is the whole description to image that two kinds of features, which are to the different describing modes of image, and global characteristics are short, right Image it is descriptive bad；Local feature is long, is the datail description to image, to the descriptive good of image.

S106 matches the detailed classification information of visual signature and original input picture.

The detailed classification information of global characteristics, local feature and corresponding original input picture is matched, is input to In database.In order to cover more as far as possible video scene and commodity, the embodiment of the present invention is directed to all target and scene Clustering has been carried out, the commodity major class list database of 1000 major class is summarized, has included the overall situation spy of commodity in the database Sign, local feature, corresponding classification information and detailed classification information.

Original image is input in the target detection identification model built, obtains the classification information of original image, so The global characteristics and local feature for extracting target area in original image afterwards, first with global characteristics in the commodity major class pre-established It is retrieved on a large scale in list database, global characteristics and good global characteristics extracted in database is compared, Global characteristics matching probability is obtained, if matching probability is higher than preset threshold, illustrates that original image is similar to the image in database； It is general to obtain local feature matching for the comparison for carrying out the local feature and local feature corresponding with global characteristics of original image again Rate is arranged by sequence from high to low, selects the highest local feature of local feature matching probability, obtains correspond in the database Detailed classification information, i.e. the detailed classification information of original image.Due to storing 1000 class merchandise newss, quantity in database It is huge, therefore construct one in server end and supported based on CDVS characteristics of image coding and extensive visual signature index technology The extensive visual search engine of million scale image data amounts, engine are supported to carry out visual signature sieve first, in accordance with classification information Choosing, view-based access control model feature carry out Secondary Match to the result after screening, obtain most close with target visual feature in affiliated major class Promotion item information.

Since global characteristics are shorter relative to local feature, retrieval can be improved using global characteristics in extensive retrieval Efficiency.The identical situation of the local feature matching probability maximum value of two or more local features is theoretically occurred, but practical It is not so not in such case unless these images are all same original images in.The mode for coping with theoretical case is to press The sequence of database is put into according to local feature, by two or more identical local features of local feature matching probability maximum value It is ranked up, chooses and arrange detailed classification information corresponding to primary local feature.

In embodiments of the present invention, to the addition of fresh target, the visual signature of extraction need to be only put in storage, is not needed anti- Training objective detects identification model again, and expanded period is short, and scalability is strong.

The embodiment of the invention provides a kind of based on the precision target of object detection and signature search knowledge method for distinguishing and is System obtains original input picture, is obtained by CNN convolutional neural networks as shown in figure 9, the system includes: module of target detection 1 To fisrt feature figure group；Resampling and concatenation are successively carried out to fisrt feature figure group, obtain second feature figure group；By second Characteristic pattern group is mapped in original input picture, and the position detection of prediction block is carried out to original input picture, is originally inputted The prediction block of image；Judge whether prediction block is target area, if so, obtaining the classification information in target area；Target is special Extraction module 2 is levied, for extracting the visual signature of target area；Target signature search module 3, for by visual signature with it is original The detailed classification information of input picture matches.

Resampling includes up-sampling and down-sampling, and module of target detection 1 is specifically used for adopt to fisrt feature figure group Sample obtains third feature figure group；Characteristic pattern except out to out characteristic pattern in third feature figure group is up-sampled, is obtained Fourth feature figure group；Characteristic pattern in third feature figure group and fourth feature figure with same scale is spliced respectively, is obtained To second feature figure group.

Module of target detection 1 divides net also particularly useful for the scale based on second feature figure group on original input picture Lattice；Each characteristic pattern in second feature figure group is respectively mapped in the grid of original input picture；It is at least generated on grid One prediction block；The correction position that prediction block is calculated based on bounding box regression model, obtains the correction value of prediction block；Based on prediction The correction value of frame and prediction block obtains the center of the prediction block of original input picture.

Target's feature-extraction module 2 is specifically used for determine that the original input picture behind target area is defeated as query image Enter into Visual Feature Retrieval Process model；Extract the visual signature of query image.

Based on above-mentioned analysis it is found that compared with simple use in the related technology is based on the detection technique of deep learning, this The method and system method based on the identification of the precision target of object detection and signature search that inventive embodiments provide utilizes object The method that detection and signature search combine, realizes accurately identifying for target, can quickly add fresh target, has good expand Malleability.

Carried out provided by the embodiment of the present invention based on the precision target of object detection and signature search know method for distinguishing and The computer program product of system, the computer readable storage medium including storing program code, said program code include Instruction can be used for executing previous methods method as described in the examples, specific implementation can be found in embodiment of the method, herein no longer It repeats.

Device based on the identification of the precision target of object detection and signature search provided by the embodiment of the present invention can be Specific hardware or the software being installed in equipment or firmware etc. in equipment.Device provided by the embodiment of the present invention, in fact Existing principle and the technical effect of generation are identical with preceding method embodiment, and to briefly describe, Installation practice part does not refer to it Place, can refer to corresponding contents in preceding method embodiment.It is apparent to those skilled in the art that for description It is convenienct and succinct, system, the specific work process of device and unit of foregoing description, during reference can be made to the above method embodiment Corresponding process, details are not described herein.

In embodiment provided by the present invention, it should be understood that disclosed device and method, it can be by others side Formula is realized.The apparatus embodiments described above are merely exemplary, for example, the division of the unit, only one kind are patrolled Function division is collected, there may be another division manner in actual implementation, in another example, multiple units or components can combine or can To be integrated into another system, or some features can be ignored or not executed.Another point, shown or discussed is mutual Coupling, direct-coupling or communication connection can be INDIRECT COUPLING or communication link by some communication interfaces, device or unit It connects, can be electrical property, mechanical or other forms.

The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme 's.

In addition, each functional unit in embodiment provided by the invention can integrate in one processing unit, it can also To be that each unit physically exists alone, can also be integrated in one unit with two or more units.

It, can be with if the function is realized in the form of SFU software functional unit and when sold or used as an independent product It is stored in a computer readable storage medium.Based on this understanding, technical solution of the present invention is substantially in other words The part of the part that contributes to existing technology or the technical solution can be embodied in the form of software products, the meter Calculation machine software product is stored in a storage medium, including some instructions are used so that a computer equipment (can be a People's computer, server or network equipment etc.) it performs all or part of the steps of the method described in the various embodiments of the present invention. And storage medium above-mentioned includes: that USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), arbitrary access are deposited The various media that can store program code such as reservoir (RAM, Random Access Memory), magnetic or disk.

It should also be noted that similar label and letter indicate similar terms in following attached drawing, therefore, once a certain Xiang Yi It is defined in a attached drawing, does not then need that it is further defined and explained in subsequent attached drawing, in addition, term " the One ", " second ", " third " etc. are only used for distinguishing description, are not understood to indicate or imply relative importance.

Finally, it should be noted that embodiment described above, only a specific embodiment of the invention, to illustrate the present invention Technical solution, rather than its limitations, scope of protection of the present invention is not limited thereto, although with reference to the foregoing embodiments to this hair It is bright to be described in detail, those skilled in the art should understand that: anyone skilled in the art In the technical scope disclosed by the present invention, it can still modify to technical solution documented by previous embodiment or can be light It is readily conceivable that variation or equivalent replacement of some of the technical features；And these modifications, variation or replacement, do not make The essence of corresponding technical solution is detached from the spirit and scope of technical solution of the embodiment of the present invention.Should all it cover in protection of the invention Within the scope of.Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. a kind of know method for distinguishing based on the precision target of object detection and signature search characterized by comprising

The second feature figure group is mapped in the original input picture, prediction block is carried out to the original input picture Position detection obtains the prediction block of the original input picture；

Extract the visual signature of the target area；

2. the method according to claim 1, wherein the resampling includes up-sampling and down-sampling, to described Fisrt feature figure group successively carries out resampling and concatenation, obtains second feature figure group and includes:

Characteristic pattern except out to out characteristic pattern in the third feature figure group is up-sampled, fourth feature figure is obtained Group；

Characteristic pattern in the third feature figure group and the fourth feature figure group with same scale is spliced respectively, is obtained To the second feature figure group.

3. the method according to claim 1, wherein it is described the second feature figure group is mapped to it is described original In input picture, prediction block position detection is carried out to the original input picture, obtains the prediction block of the original input picture Include:

A prediction block is at least generated on the grid；

Correction value based on the prediction block and the prediction block, obtains the centre bit of the prediction block of the original input picture It sets.

4. judging whether the prediction block is target area packet the method according to claim 1, wherein described It includes:

Destination probability is generated in the prediction block；The destination probability is judged whether less than the first preset threshold, if so, institute Stating prediction block not is the target area；

If it is not, then the prediction block is the target area.

5. the method according to claim 1, wherein described if so, obtaining the classification in the target area Information includes:

Generate the class probability of the target；

The condition class probability maximum value is chosen, it will classification information input corresponding with the condition class probability maximum value Into database.

6. the method according to claim 1, wherein the visual signature packet for extracting the object detection area It includes:

It will determine that the original input picture behind target area is input in Visual Feature Retrieval Process model as query image；

Extract the visual signature of the query image.

7. a kind of system based on the identification of the precision target of object detection and signature search characterized by comprising

Module of target detection obtains original input picture, obtains fisrt feature figure group by CNN convolutional neural networks；To described Fisrt feature figure group successively carries out resampling and concatenation, obtains second feature figure group；The second feature figure group is mapped Into the original input picture, the position detection of prediction block is carried out to the original input picture, obtains described be originally inputted The prediction block of image；Judge whether the prediction block is target area, if so, obtaining the classification letter in the target area Breath；

8. system according to claim 7, which is characterized in that the resampling includes up-sampling and down-sampling, the mesh Mark detection module is specifically used for:

Characteristic pattern in the third feature figure group and the fourth feature figure with same scale is spliced respectively, is obtained The second feature figure group.

9. system according to claim 7, which is characterized in that the module of target detection also particularly useful for:

A prediction block is at least generated on the grid；

10. system according to claim 7, which is characterized in that the target's feature-extraction module is specifically used for:

Extract the visual signature of the query image.