CN110458107A

CN110458107A - Method and apparatus for image recognition

Info

Publication number: CN110458107A
Application number: CN201910744806.4A
Authority: CN
Inventors: 张滨; 韩树民; 冯原; 李振东; 刘静伟
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2019-08-13
Filing date: 2019-08-13
Publication date: 2019-11-15
Anticipated expiration: 2039-08-13
Also published as: CN110458107B

Abstract

Embodiment of the disclosure discloses the method and apparatus for image recognition.One specific embodiment of this method includes: the image for obtaining object to be identified；By image input disaggregated model trained in advance, the feature of classification prediction result and object is obtained；The confidence level of classification prediction result is calculated based on classification prediction result；If the confidence level for prediction result of classifying is less than or equal to the first believability threshold, feature is inputted to the retrieval model constructed in advance, obtains retrieval prediction result；The confidence level of retrieval prediction result is calculated based on retrieval prediction result；If the confidence level for retrieving prediction result is greater than the second believability threshold, retrieval prediction result is exported.The embodiment realizes a variety of identification prediction results and combines, and improves the accuracy finally predicted.

Description

Method and apparatus for image recognition

Technical field

Embodiment of the disclosure is related to field of computer technology, and in particular to the method and apparatus for image recognition.

Background technique

Image recognition technology is the important technology of information age, and generating purpose is to allow computer generation for people Class goes to handle a large amount of physical message.With the development of computer technology, the mankind are deeper and deeper to the understanding of image recognition technology It carves.

Image recognition technology can be applied to the identification of a variety of objects, for example, face, animal, plant.Estimation existence at present exists Tellurian plant has more than 50 ten thousand kinds, and type is numerous and complicated, and pattern is different, and the plant category of Chinese Plants will income has more than 3.7 ten thousand kinds, Every kind of plant forgives 6 root, stem, leaf, flower, fruit, seed parts again, and each position is by environment, region, the shadows such as seasonal climate It rings, shows different forms, so that plants identification, which becomes, relies on the problem of a large amount of professional knowledge talents just can solve.In ring Border monitoring, plant species invasive biology etc. require to put into a large amount of manpower and financial resources, and rely on the mode of manual identified, Actual effect is unable to get satisfaction.Extensive high-quality plants species identification demand is come into being.We are mainly from section-category-at present Kind granularity carries out Classification and Identification to plant, preferably to grasp plant population distribution, identifies biological object to import and export plant Kind invasion, environmental monitoring, the work belts such as geological ecology variation carry out the foundation of basic research.The time of identification can also be mitigated significantly With reduction human input.

It generallys use following manner and carries out image recognition:

1, experience identifies, manual identified: each object is a large amount of to rely on professional knowledge and manpower, identification by manual identified Low efficiency, it is higher to professional skill requirement, it is completely dependent on the specialized capability and experience of personnel, scalability is effective poor.

2, characteristics of image identifies: biology, especially plant is influenced by amblent air temperature, exists on growthform and mark Biggish difference, the characteristic of plant can not be completely represented with characteristics of image, and the feature that can not embody plant various pieces is closed System.(such as feature square, area ratio, edge etc.), and there is different, each plant differentiation complexity in the characteristic feature region of different plants Degree is high, and after type increases considerably, recognition performance also can sharply decline.

3, machine learning identify: by design feature add classifier in the way of, design feature artificial first there are congenital Disadvantage, sensitive to parameter and picture quality, secondly classifier is strongly dependent upon the accuracy of data standard, and parameter adjustment is multiple It is miscellaneous, when more classification tasks, classifier is challenged big.

Summary of the invention

Embodiment of the disclosure proposes the method and apparatus for image recognition.

In a first aspect, embodiment of the disclosure provides a kind of method for image recognition, comprising: obtain to be identified The image of object；By image input disaggregated model trained in advance, the feature of classification prediction result and object is obtained；Based on classification Prediction result calculates the confidence level of classification prediction result；If the confidence level for prediction result of classifying is less than or equal to the first confidence level threshold Feature is then inputted the retrieval model constructed in advance by value, obtains retrieval prediction result；It is pre- that retrieval is calculated based on retrieval prediction result Survey the confidence level of result；If the confidence level for retrieving prediction result is greater than the second believability threshold, retrieval prediction result is exported.

In some embodiments, this method further include: if the confidence level of classification prediction result is greater than the first believability threshold, Then output category prediction result.

In some embodiments, this method further include: if the confidence level of retrieval prediction result is less than or equal to the second confidence level Threshold value, the confidence level based on classification prediction result and the various prediction results for retrieving prediction result calculating image；Various prediction knots The highest predetermined number prediction result of the confidence level of fruit is exported as final prediction result.

In some embodiments, the image of object to be identified is obtained, comprising: will include object to be identified and background Image input subject detection model trained in advance, obtains the image of object to be identified.

In some embodiments, the disaggregated model that image input is trained in advance, comprising: by the image of object to be identified Two disaggregated models of input training in advance, whether judgment object is target category；If target category, then image is inputted preparatory The disaggregated model of the image of trained target category for identification.

In some embodiments, disaggregated model is Inception-ResNetv2 model, using the side of classification uniform sampling Formula progress samples selection, and label is added and smoothly trains using cosine learning rate decaying strategy with mixed strategy and loses use Cross entropy loss function.

In some embodiments, retrieval model constructs as follows: the feature of each image in predetermined image library is extracted, Wherein, each image is corresponding with classification in image library；The feature of each image is subjected to dimension-reduction treatment；To each image after dimensionality reduction Feature construction index, wherein index includes forward index and/or inverted index.

In some embodiments, subject detection model uses Faster-RCNN.

In some embodiments, two disaggregated models use ResNet-34 model.

Second aspect, embodiment of the disclosure provide a kind of device for image recognition, comprising: acquiring unit, quilt It is configured to obtain the image of object to be identified；Taxon is configured to inputting image into disaggregated model trained in advance, obtain To the feature of classification prediction result and object；First computing unit is configured to calculate classification prediction based on classification prediction result As a result confidence level；Retrieval unit, if the confidence level for the prediction result that is configured to classify is less than or equal to the first believability threshold, Feature is inputted to the retrieval model constructed in advance, obtains retrieval prediction result；Second computing unit is configured to pre- based on retrieval Survey the confidence level that result calculates retrieval prediction result；Output unit, if the confidence level for being configured to retrieve prediction result is greater than the Two believability thresholds then export retrieval prediction result.

In some embodiments, output unit is further configured to: if the confidence level of classification prediction result is greater than first Believability threshold, then output category prediction result.

In some embodiments, which further includes integrated unit, is configured to: if the confidence level of retrieval prediction result is small In be equal to the second believability threshold, based on classification prediction result and retrieve prediction result calculate image various prediction results can Reliability；The highest predetermined number prediction result of the confidence level of various prediction results is exported as final prediction result.

In some embodiments, acquiring unit further by: will include object to be identified and background image input it is pre- First trained subject detection model, obtains the image of object to be identified.

In some embodiments, taxon is further configured to: the image input of object to be identified is instructed in advance Whether two experienced disaggregated models, judgment object are target category；If target category, then image input training in advance is used for Identify the disaggregated model of the image of target category.

In some embodiments, which further includes construction unit, is configured to: extracting each image in predetermined image library Feature, wherein each image is corresponding with classification in image library；The feature of each image is subjected to dimension-reduction treatment；To each after dimensionality reduction The feature construction of image indexes, wherein index includes forward index and/or inverted index.

In some embodiments, subject detection model uses Faster-RCNN.

In some embodiments, two disaggregated models use ResNet-34 model.

The third aspect, embodiment of the disclosure provide a kind of electronic equipment for image recognition, comprising: one or more A processor；Storage device is stored thereon with one or more programs, when one or more programs are by one or more processors It executes, so that one or more processors are realized such as method any in first aspect.

The third aspect, embodiment of the disclosure provide a kind of computer-readable medium, are stored thereon with computer program, Wherein, it realizes when program is executed by processor such as method any in first aspect.

The method and apparatus for image recognition that embodiment of the disclosure provides, without special acquisition equipment, image Adaptability is good.A variety of identification prediction results combine, and improve the accuracy finally predicted.Without special design feature, depth is used Network model improves tomographic image identification feature, is respectively applied to classification and retrieval.Using retrieval prediction mode, can support Upper all creation plants identification, accuracy rate is high, and scalability is strong.Entire result is to threshold value, image category strong robustness, occur new category and When newly-increased image, the adjustment recognition effect time is short, and timeliness is strong.

Detailed description of the invention

By reading a detailed description of non-restrictive embodiments in the light of the attached drawings below, the disclosure is other Feature, objects and advantages will become more apparent upon:

Fig. 1 is that one embodiment of the disclosure can be applied to exemplary system architecture figure therein；

Fig. 2 is the flow chart according to one embodiment of the method for image recognition of the disclosure；

Fig. 3 is the schematic diagram according to an application scenarios of the method for image recognition of the disclosure；

Fig. 4 is the flow chart according to another embodiment of the method for image recognition of the disclosure；

Fig. 5 is the structural schematic diagram according to one embodiment of the device for image recognition of the disclosure；

Fig. 6 is adapted for the structural schematic diagram for the computer system for realizing the electronic equipment of embodiment of the disclosure.

Specific embodiment

The disclosure is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is used only for explaining related invention, rather than the restriction to the invention.It also should be noted that in order to Convenient for description, part relevant to related invention is illustrated only in attached drawing.

It should be noted that in the absence of conflict, the feature in embodiment and embodiment in the disclosure can phase Mutually combination.The disclosure is described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.

Fig. 1 is shown can be using the method for image recognition of the disclosure or the implementation of the device for image recognition The exemplary system architecture 100 of example.

As shown in Figure 1, system architecture 100 may include terminal device 101,102,103, network 104 and server 105. Network 104 between terminal device 101,102,103 and server 105 to provide the medium of communication link.Network 104 can be with Including various connection types, such as wired, wireless communication link or fiber optic cables etc..

User can be used terminal device 101,102,103 and be interacted by network 104 with server 105, to receive or send out Send message etc..Various telecommunication customer end applications can be installed, such as image recognition class is answered on terminal device 101,102,103 With, web browser applications, shopping class application, searching class application, instant messaging tools, mailbox client, social platform software Deng.

Terminal device 101,102,103 can be hardware, be also possible to software.When terminal device 101,102,103 is hard When part, the various electronic equipments of picture browsing, including but not limited to smart phone, plate are can be with camera and supported Computer, E-book reader, MP3 player (Moving Picture Experts Group Audio Layer III, dynamic Image expert's compression standard audio level 3), MP4 (Moving Picture Experts Group Audio Layer IV, move State image expert's compression standard audio level 4) player, pocket computer on knee and desktop computer etc..When terminal is set Standby 101,102,103 when being software, may be mounted in above-mentioned cited electronic equipment.Its may be implemented into multiple softwares or Software module (such as providing Distributed Services), also may be implemented into single software or software module.It does not do herein specific It limits.

Server 105 can be to provide the server of various services, such as to showing on terminal device 101,102,103 Image provides the backstage image recognition server supported.Backstage image recognition server can request the image recognition received Etc. data carry out the processing such as analyzing, and processing result (such as classification of plant) is fed back into terminal device.

It should be noted that server can be hardware, it is also possible to software.When server is hardware, may be implemented At the distributed server cluster that multiple servers form, individual server also may be implemented into.It, can when server is software It, can also be with to be implemented as multiple softwares or software module (such as providing multiple softwares of Distributed Services or software module) It is implemented as single software or software module.It is not specifically limited herein.

It should be noted that the method provided by embodiment of the disclosure for image recognition is generally by server 105 It executes, correspondingly, the device for image recognition is generally positioned in server 105.

It should be understood that the number of terminal device, network and server in Fig. 1 is only schematical.According to realization need It wants, can have any number of terminal device, network and server.

With continued reference to Fig. 2, the process of one embodiment of the method for image recognition according to the disclosure is shown 200.This is used for the method for image recognition, comprising the following steps:

Step 201, the image of object to be identified is obtained.

In the present embodiment, can lead to for the executing subject of the method for image recognition (such as server shown in FIG. 1) It crosses wired connection mode or radio connection and receives object to be identified using its terminal for carrying out image recognition from user Image.The image can be the photo in kind of the camera shooting of mobile terminal, be also possible to the remote sensing images of satellite shooting. Object can be the mankind, animal, plant etc..Hereinafter mainly by taking plant as an example, the process of image recognition is specifically introduced.

In some optional implementations of the present embodiment, obtain the image of object to be identified, comprising: will include to The subject detection model that the object of identification and the image input of background are trained in advance, obtains the image of object to be identified.

Subject detection model realization subject detection method, the implementation of subject detection method, including but not limited to Following method.

Subject detection model can detect body region in a picture automatically, remove the interference of background area, right It is of great significance in object classification.Subject detection method relies primarily on object detection method, from it is past during the last ten years, from The algorithm of target detection of right image can generally be divided into the period (before 2013) based on traditional-handwork feature and be based on deep Spend the target detection period of study (2013 so far).The algorithm of target detection of early stage is based on constructed by manual feature mostly. Due to lacking effective image feature representation method before deep learning is born, people have to try one's best to design more more The detection algorithm of memberization is to make up the defect in manual feature representation ability.Simultaneously as the shortage of computing resource, people must not More exquisite calculation method is not found simultaneously to accelerate model.Rise and convolutional neural networks layer with deep learning Several continuous intensifications, the abstracting power of network, anti-translation capability and anti-dimensional variation ability are more and more stronger, produce a collection of representative The deep learning object detection method of property, these methods can be divided into two major class again, and one is known as single phase (one stage) Object detection method, including YOLO series methods (YOLO v2, YOLO9000, YOLOv3 etc.), G-CNN, SSD series methods (R- SSD, DSSD, DSOD, FSSD etc.)；Secondly referred to as two stages (two stage) object detection method, including R-CNN, SPPNet, Fast-RCNN, Faster-RCNN, FPN etc..Two stages object detection method can be obtained compared to single phase object detection method Higher Detection accuracy, while Faster-RCNN is the relatively stable model in two stages object detection method, so this Method mainly uses Faster-RCNN as subject detection method.

Faster-RCNN is first detection algorithm of deep learning end to end and first standard truly The deep learning algorithm of target detection of (17 frames/second, 640 × 480 pixels) in real time.The detection process of Faster-RCNN, main point Basic feature extraction is carried out using VGG network structure for three parts: first part；Second part is RPN (region Proposal network, region recommendation network), before responsible calculating is there may be the coordinate of mesh target area and judgement Scape/background；For input feature vector figure, characteristic pattern required for a 3*3 convolution obtains proposal layers, Zhi Houli are first passed around With two 1*1 convolution calculate separately generate anchor (anchor point) classification score and frame regressor, frame regressor and Anchor can calculate the proposal coordinate of prediction in the respective coordinates of image, and calculation formula is as follows:

d_x(P),d_y(P),d_w(P),d_h(P) the x-axis translation transformation that model learning arrives is respectively indicated, y-axis translation transformation is wide Spend scale transformation and height scale transformation.P_x, P_yIndicate the x coordinate and y-coordinate at the original center anchor, P_w, P_hIndicate original The width and height of anchor.The prediction that G is indicated.The target area proposal coordinate obtained later using RPN is passed through again Crossing ROI, ((Region of Interest, region of interest) pond layer obtains the feature vector of equal length；Part III, finally Specific classification is realized by two full articulamentum access softmax and more accurately returns coordinate.

If not detecting body region in this step, " image is without main body " is returned to, the mono-/multi- that otherwise will test Body region is identified into subsequent step.

Step 202, the disaggregated model that image input is trained in advance, obtains the feature of classification prediction result and object.

In the present embodiment, image input step 201 obtained after subject detection disaggregated model trained in advance, into Row classification prediction.Optionally, it before carrying out plant disaggregated classification, needs first to construct relatively simple two disaggregated model and (uses In rough sort), for judging whether input picture is plant, if it is plant, then input picture is input to plant classification mould Type then directly exports prediction result " image non-plant " if not plant, will enter in plant subdivision class model if plant It carries out, disaggregated classification identification calculates.This model can be selected including VGG, ResNet, the common convolutional Neural net such as GoogleNet Network carries out feature extraction, and it is pre- in the classification layer of model end access features dimension to classification dimension (two classify) classify It surveys.The efficiency of extensive plants identification can be improved using this model, reduce calculation amount, quickly filter out non-plant input figure Picture.Similarly, two disaggregated models of respective classes can be used before the image for identifying other objects, for example, identification animal, vehicle Two disaggregated models.

In the present embodiment, above-mentioned two disaggregated model can be deep neural network model, such as ResNet-34 model, when It so may be other deep neural network models, concrete form of the present embodiment to used deep neural network model It is not construed as limiting, the present embodiment is illustrated by taking ResNet-34 model as an example.

The ResNet-34 sorter network that the present embodiment uses carries out two classification to body region, includes 7*7 convolution kernel one Layer, maxpooling, in addition four convolutional layers of 3*3 convolution kernel composition, respectively 3, additionally need average by 4,6,3 layers Pooling (average pond) and fc layer operation.Disaggregated model is initialized simultaneously using the weight of ImageNet pre-training below The last one the classification layer for removing ResNet-34 model is changed to the two classification classification layers suitable for this example and carries out random initial Change.

In some optional implementations of the present embodiment, the learning rate that characteristic layer uses is the 1/ of classification layer learning rate 10。

In some optional implementations of the present embodiment, using the strategy of exponential damping learning rate to pre-training model It is finely adjusted.

α=0.95^epoch-numα₀

Wherein epoch is number cycle of training, and num is set as 0, α₀It is initial learning rate 0.1

In some optional implementations of the present embodiment, training loss uses cross entropy loss function.

Wherein, m is number of samples, y⁽ⁱ⁾For sample label,For probability.

For being judged as the picture of plant, the disaggregated model for designing a disaggregated classification carries out classification prediction.Assuming that input figure Piece is plant, since plant classification is various, so needing the disaggregated model that can identify fine feature difference.In this example For up to 2.4 ten thousand kinds or more of plant classification, classified using the feature of engineer and do not have feasibility in practice, In recent years with the further development of deep learning, in many image classification tasks, using convolutional neural networks as the image of representative Classification method has surmounted traditionally the manually classification method of mark feature, and academia's verified convolutional neural networks The feature of extraction is more preferable compared to the feature manually marked, so the extensive plant classification scene needs in this example can be automatic Generate the deep neural network model of characteristic of division.

In the present embodiment, the above-mentioned disaggregated model for disaggregated classification can be deep neural network model, such as Inception-ResNetv2 model, naturally it is also possible to which, for other deep neural network models, the present embodiment is to used The concrete form of deep neural network model is not construed as limiting, and the present embodiment is said by taking Inception-ResNetv2 model as an example It is bright.

Inception-ResNetv2 model is the improvement to Inception-V4 model, and improved emphasis is will be traditional Inception module replacing is Inception-ResNet, it is realized at that time most in ILSVRC image classification benchmark test Good achievement.

In some optional implementations of the present embodiment, for object to be identified data set in there are data class Do not obscure easily, is distributed unbalanced problem.The mode that classification uniform sampling can be used carries out samples selection, and label is added Smoothing (label is smooth) and mixup (mixing) strategy, can be improved the accuracy rate of identification classification.

The Inception-ResNetv2 model that the present embodiment uses is carried out initially with the good weight of ImageNet pre-training Change, and eliminates the last one classification layer and be changed to the 2.4 all creations classification layer suitable for this example and carry out random initializtion.

Initial learning rate is that 0.05, max-epoch is set as 70 workable for the model of the present embodiment；

In some optional implementations of the present embodiment, cosine learning rate decaying strategy can be used, utilize this plan It is decreased slightly as low learning rate.

In some optional implementations of the present embodiment, the decaying of L2 weight can be used to carry out regularization, damped expoential It is 1e-4.

In some optional implementations of the present embodiment, cross entropy loss function is can be used in training loss:

Prediction result of classifying may include that object belongs to probability of all categories, i.e. score, can be by the highest predetermined number of score A classification is as final classification prediction result, such as top5 result.During neural computing, the feature of the last layer Figure will also export, the input as detection prediction model.

Step 203, the confidence level of classification prediction result is calculated based on classification prediction result.

In the present embodiment, classification prediction result is the highest predetermined number classification of score and score and convolution feature As output.First determine whether recognition result is credible, if credible direct output category prediction result, otherwise will utilize model Output feature carry out step 204 characteristic key.If judging result is as follows:

Wherein, n is the number of the classification of default output, α_iFor the weight of each classification prediction result, can be arranged according to score Name setting weight, such as the maximum weight of the highest classification prediction result of score,S_iFor each classification results The score of return.S_thFor the composite score returned the result.Th is the first believability threshold.

Step 204, if the confidence level of classification prediction result is greater than the first believability threshold, output category prediction result.

In the present embodiment, work as S_thWhen meeting threshold requirement, T=1 presentation class prediction result is credible, then output category knot Otherwise fruit enters step 205.

Step 205, if the confidence level of classification prediction result is less than or equal to the first believability threshold, feature is inputted preparatory The retrieval model of building obtains retrieval prediction result.

In the present embodiment, when the classification prediction result of step 203 is insincere, increase retrieval model, count object Situation and confidence level are recalled, result fusion output is carried out.

Actually predictable picture classification can reach more than 2.4 ten thousand classes to the present embodiment, because existing in data set many similar And alias condition, and there are the serious unbalanced problems of data.It, can also be with so for the insincere prediction that disaggregated model provides Use retrieval model as additional project, further increases the predictive ability of this example.Here retrieval model is to classification mould One important supplement of type, and the accuracy rate of kind of object prediction result can be greatly improved.

The constructing plan of retrieval model is as follows: can be used the present embodiment according to Inception- in step 203 first The trunk convolutional network of ResNetv2 carries out feature extraction as feature extractor, to the image in image library, utilizes retrieval mould The retrieval image and similarity that type returns, the classification and similarity score retrieved.The implementation of feature extractor is unlimited In the trunk convolutional network of Inception-ResNetv2.Retrieval model stores the feature of object and the corresponding relationship of classification. In retrieving, the feature of the image in the feature and image library of the image of object to be identified is subjected to similarity calculation, figure As the classification that the corresponding classification of the highest predetermined number image of similarity as retrieves in library, calculated similarity is inspection The score of rope prediction result.

It,, can in order to accelerate recall precision when actual implementation search library in some optional implementations of the present embodiment Dimension-reduction treatment is carried out to feature as needed, optional dimension reduction method includes PCA, LDA, LLE etc., and the feature after dimensionality reduction is relatively former Feature has stronger expressive force, and information is lost seldom, but can greatly speed up effectiveness of retrieval, promotes whole identifying system Fluency.

Then the feature construction extracted from image library can be indexed, obtains search library required for retrieval module.Retrieval side Method includes ordered retrieval, the method for the rope of just listing and indexing such as tree search, and retrieval rate is relatively slow；It is fallen with what Hash was retrieved as representing Row's search method can greatly promote retrieval rate.Feature is passed through a fixed algorithmic function by Hash table (Hash Table) As soon as i.e. so-called hash function is converted into an integer number, the digital logarithm group length is then subjected to remainder, remainder result Just as the subscript of array of indexes, it is in lower target array space that value, which is stored in the number,.And when use Hash table into When row retrieval, exactly reusing hash function for Feature Conversion is corresponding array index, and navigates to the space and obtain Value is taken, the positioning performance of array can thus be made full use of to carry out data retrieval.

Step 206, the confidence level of retrieval prediction result is calculated based on retrieval prediction result.

In the present embodiment, the incredible situation of prediction provided for disaggregated model, retrieval model can extract to be retrieved The feature of picture, and retrieved in search library.Retrieval every time, can first retrieve TOP N and predict and count appearance of all categories Number and similarity are weighted result index order according to frequency of occurrence and score, and finally provide highest predetermined number A (such as TOP10) prediction result and score, prediction separately include the classification and confidence level of prediction.Confidence level computing formula is such as Under, wherein k and k_iIt indicates in TOP N prediction, the number of the i-th class image, α_iIt is each weight returned the result (with disaggregated model In weight it is identical).S_iFor the similarity that detection model returns the result, P_iIndicate that the classification that picture prediction to be retrieved is the i-th class can Reliability, Score_iFor similarity threshold, L_iFor the confidence level of final result, L_th1The composite score returned the result for retrieval model:

L_i=α * P_i+(1-α)Score_i

Also according to the mode of the class prediction Credibility judgement in step 203, judge whether the confidence level meets the requirements, Wherein, th1 is the second believability threshold, should select the range different from the th for prediction of classifying in th1 selection.

Step 207, if the confidence level of retrieval prediction result is greater than the second believability threshold, retrieval prediction result is exported.

In the present embodiment, if the confidence level of retrieval prediction result is met the requirements, T=1 i.e. retrieval prediction result is can Letter, then retrieval prediction result is exported, the fusion results of output category prediction result and retrieval prediction result if being unsatisfactory for.

With continued reference to the signal that Fig. 3, Fig. 3 are according to the application scenarios of the method for image recognition of the present embodiment Figure.In the application scenarios of Fig. 3, server gets the main body that image is detected after the image including background, i.e., to be identified The image of object, is entered into disaggregated model, obtains classification prediction result.It is insincere to calculate classification prediction result, then will Image inputs retrieval model.Obtained retrieval prediction result is and the image similarity highest predetermined number is known classifies Image and similarity (score).The confidence level that retrieval prediction result is calculated according to retrieval prediction result, will retrieval if credible Prediction result is as final result, and otherwise, retrieval prediction result and classification prediction result are blended.

The method provided by the above embodiment of the disclosure is improved by prediction result and the retrieval prediction result combination of classifying The accuracy of final prediction.

With further reference to Fig. 4, it illustrates the processes 400 of another embodiment of the method for image recognition.The use In the process 400 of the method for image recognition, comprising the following steps:

Step 401, the image of object to be identified is obtained.

Step 402, the disaggregated model that image input is trained in advance, obtains the feature of classification prediction result and object.

Step 403, the confidence level of classification prediction result is calculated based on classification prediction result.

Step 404, if the confidence level of classification prediction result is greater than the first believability threshold, output category prediction result.

Step 405, if the confidence level of classification prediction result is less than or equal to the first believability threshold, feature is inputted preparatory The retrieval model of building obtains retrieval prediction result.

Step 406, the confidence level of retrieval prediction result is calculated based on retrieval prediction result.

Step 407, if the confidence level of retrieval prediction result is greater than the second believability threshold, retrieval prediction result is exported.

Step 401-407 and step 201-207 are essentially identical, therefore repeat no more.

Step 408, the confidence level based on classification prediction result and the various prediction results for retrieving prediction result calculating image.

In the present embodiment, in prediction, the prediction of disaggregated model is used if the prediction that disaggregated model obtains is credible As a result, the prediction result of disaggregated model is insincere and the search result of retrieval model is credible, then use the retrieval knot of retrieval model Fruit；If the two is all insincere, need to merge the prediction result of the two.Integration program is prediction result ballot, if the two Prediction result there are intersection, then the confidence level weighted sum of intersection part, later by prediction result according to confidence level size into Row sequence finally provides highest tentation data (for example, TOP3) classification of ranking results as prediction result.As followsFor the label of classification results,For classification results confidence level, α is the weight of classification prediction result,For Checking label,For search result confidence level.Final confidence level is

Step 409, the highest predetermined number prediction result of the confidence level of various prediction results is as final prediction result Output.

In the present embodiment, it is resequenced according to the final confidence of the label of each classification results and exports ranking results Highest tentation data (for example, TOP3) classification is as final prediction result.

Figure 4, it is seen that the method for image recognition compared with the corresponding embodiment of Fig. 2, in the present embodiment Process 400 embody will classification prediction result with retrieve prediction result merged the step of.To further increase image The accuracy rate and scalability of identification.Entire result is to threshold value, image category strong robustness, when new category and newly-increased image occurs, The adjustment recognition effect time is short, and timeliness is strong.

With further reference to Fig. 5, as the realization to method shown in above-mentioned each figure, present disclose provides one kind to know for image One embodiment of other device, the Installation practice is corresponding with embodiment of the method shown in Fig. 2, which can specifically answer For in various electronic equipments.

As shown in figure 5, the device 500 for image recognition of the present embodiment includes: acquiring unit 501, taxon 502, the first computing unit 503, retrieval unit 504, the second computing unit 505 and output unit 506.Wherein, acquiring unit 501, it is configured to obtain the image of object to be identified；Taxon 502 is configured to input image point of training in advance Class model obtains the feature of classification prediction result and object；First computing unit 503 is configured to based on classification prediction result Calculate the confidence level of classification prediction result；Retrieval unit 504, if the confidence level for the prediction result that is configured to classify is less than or equal to the Feature is then inputted the retrieval model constructed in advance by one believability threshold, obtains retrieval prediction result；Second computing unit 505, It is configured to calculate the confidence level of retrieval prediction result based on retrieval prediction result；Output unit 506, if being configured to retrieve pre- The confidence level for surveying result is greater than the second believability threshold, then exports retrieval prediction result.

In the present embodiment, it is calculated for the acquiring unit 501 of the device of image recognition 500, taxon 502, first The specific processing of unit 503, retrieval unit 504, the second computing unit 505 and output unit 506 can be corresponded to reference to Fig. 2 to be implemented Step 201-207 in example.

In some optional implementations of the present embodiment, output unit 506 is further configured to: if classification prediction As a result confidence level is greater than the first believability threshold, then output category prediction result.

In some optional implementations of the present embodiment, device 500 further includes integrated unit (attached to be not shown in the figure), It is configured to: if the confidence level of retrieval prediction result is less than or equal to the second believability threshold, based on classification prediction result and retrieval Prediction result calculates the confidence level of the various prediction results of image；The confidence level of various prediction results highest predetermined number is pre- Result is surveyed to export as final prediction result.

In some optional implementations of the present embodiment, acquiring unit 501 is further configured to: will include wait know The subject detection model that other object and the input of the image of background are trained in advance, obtains the image of object to be identified.

In some optional implementations of the present embodiment, taxon 502 is further configured to: will be to be identified Two disaggregated models of the image input training in advance of object, whether judgment object is target category；It, then will figure if target category Disaggregated model as inputting the image of the target category for identification of training in advance.

In some optional implementations of the present embodiment, disaggregated model is Inception-ResNetv2 model, is adopted Samples selection is carried out with the mode of classification uniform sampling, and is added that label is smooth and mixed strategy, is declined using cosine learning rate Subtract strategy, training loss uses cross entropy loss function.

In some optional implementations of the present embodiment, device 500 further includes construction unit (attached to be not shown in the figure), It is configured to: extracting the feature of each image in predetermined image library, wherein each image is corresponding with classification in image library；By each figure The feature of picture carries out dimension-reduction treatment；To the feature construction of each image after dimensionality reduction index, wherein index include forward index and/ Or inverted index.

In some optional implementations of the present embodiment, subject detection model uses Faster-RCNN.

In some optional implementations of the present embodiment, two disaggregated models use ResNet-34 model.

Below with reference to Fig. 6, it illustrates the electronic equipment that is suitable for being used to realize embodiment of the disclosure, (example is as shown in figure 1 Server) 600 structural schematic diagram.Server shown in Fig. 6 is only an example, should not be to the function of embodiment of the disclosure Any restrictions can be brought with use scope.

As shown in fig. 6, electronic equipment 600 may include processing unit (such as central processing unit, graphics processor etc.) 601, random access can be loaded into according to the program being stored in read-only memory (ROM) 602 or from storage device 608 Program in memory (RAM) 603 and execute various movements appropriate and processing.In RAM 603, it is also stored with electronic equipment Various programs and data needed for 600 operations.Processing unit 601, ROM 602 and RAM 603 pass through the phase each other of bus 604 Even.Input/output (I/O) interface 605 is also connected to bus 604.

In general, following device can connect to I/O interface 605: including such as touch screen, touch tablet, keyboard, mouse, taking the photograph As the input unit 606 of head, microphone, accelerometer, gyroscope etc.；Including such as liquid crystal display (LCD), loudspeaker, vibration The output device 607 of dynamic device etc.；Storage device 608 including such as tape, hard disk etc.；And communication device 609.Communication device 609, which can permit electronic equipment 600, is wirelessly or non-wirelessly communicated with other equipment to exchange data.Although Fig. 6 shows tool There is the electronic equipment 600 of various devices, it should be understood that being not required for implementing or having all devices shown.It can be with Alternatively implement or have more or fewer devices.Each box shown in Fig. 6 can represent a device, can also root According to needing to represent multiple devices.

Particularly, in accordance with an embodiment of the present disclosure, it may be implemented as computer above with reference to the process of flow chart description Software program.For example, embodiment of the disclosure includes a kind of computer program product comprising be carried on computer-readable medium On computer program, which includes the program code for method shown in execution flow chart.In such reality It applies in example, which can be downloaded and installed from network by communication device 609, or from storage device 608 It is mounted, or is mounted from ROM 602.When the computer program is executed by processing unit 601, the implementation of the disclosure is executed The above-mentioned function of being limited in the method for example.It should be noted that computer-readable medium described in embodiment of the disclosure can be with It is computer-readable signal media or computer readable storage medium either the two any combination.It is computer-readable Storage medium for example may be-but not limited to-the system of electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor, device or Device, or any above combination.The more specific example of computer readable storage medium can include but is not limited to: have The electrical connection of one or more conducting wires, portable computer diskette, hard disk, random access storage device (RAM), read-only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD- ROM), light storage device, magnetic memory device or above-mentioned any appropriate combination.In embodiment of the disclosure, computer Readable storage medium storing program for executing can be any tangible medium for including or store program, which can be commanded execution system, device Either device use or in connection.And in embodiment of the disclosure, computer-readable signal media may include In a base band or as the data-signal that carrier wave a part is propagated, wherein carrying computer-readable program code.It is this The data-signal of propagation can take various forms, including but not limited to electromagnetic signal, optical signal or above-mentioned any appropriate Combination.Computer-readable signal media can also be any computer-readable medium other than computer readable storage medium, should Computer-readable signal media can send, propagate or transmit for by instruction execution system, device or device use or Person's program in connection.The program code for including on computer-readable medium can transmit with any suitable medium, Including but not limited to: electric wire, optical cable, RF (radio frequency) etc. or above-mentioned any appropriate combination.

Above-mentioned computer-readable medium can be included in above-mentioned electronic equipment；It is also possible to individualism, and not It is fitted into the electronic equipment.Above-mentioned computer-readable medium carries one or more program, when said one or more When a program is executed by the electronic equipment, so that the electronic equipment: obtaining the image of object to be identified；Image input is preparatory Trained disaggregated model obtains the feature of classification prediction result and object；Classification prediction result is calculated based on classification prediction result Confidence level；If the confidence level for prediction result of classifying is less than or equal to the first believability threshold, feature input is constructed in advance Retrieval model obtains retrieval prediction result；The confidence level of retrieval prediction result is calculated based on retrieval prediction result；If retrieval prediction As a result confidence level is greater than the second believability threshold, then exports retrieval prediction result.

The behaviour for executing embodiment of the disclosure can be write with one or more programming languages or combinations thereof The computer program code of work, described program design language include object oriented program language-such as Java, Smalltalk, C++ further include conventional procedural programming language-such as " C " language or similar program design language Speech.Program code can be executed fully on the user computer, partly be executed on the user computer, as an independence Software package execute, part on the user computer part execute on the remote computer or completely in remote computer or It is executed on server.In situations involving remote computers, remote computer can pass through the network of any kind --- packet It includes local area network (LAN) or wide area network (WAN)-is connected to subscriber computer, or, it may be connected to outer computer (such as benefit It is connected with ISP by internet).

Flow chart and block diagram in attached drawing are illustrated according to the system of the various embodiments of the disclosure, method and computer journey The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation A part of one module, program segment or code of table, a part of the module, program segment or code include one or more use The executable instruction of the logic function as defined in realizing.It should also be noted that in some implementations as replacements, being marked in box The function of note can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are actually It can be basically executed in parallel, they can also be executed in the opposite order sometimes, and this depends on the function involved.Also it to infuse Meaning, the combination of each box in block diagram and or flow chart and the box in block diagram and or flow chart can be with holding The dedicated hardware based system of functions or operations as defined in row is realized, or can use specialized hardware and computer instruction Combination realize.

Being described in unit involved in embodiment of the disclosure can be realized by way of software, can also be passed through The mode of hardware is realized.Described unit also can be set in the processor, for example, can be described as: a kind of processor Including acquiring unit, taxon, the first computing unit, retrieval unit, the second computing unit, output unit.Wherein, these lists The title of member does not constitute the restriction to the unit itself under certain conditions, for example, acquiring unit is also described as " obtaining Take the unit of the image of object to be identified ".

Above description is only the preferred embodiment of the disclosure and the explanation to institute's application technology principle.Those skilled in the art Member is it should be appreciated that invention scope involved in the disclosure, however it is not limited to technology made of the specific combination of above-mentioned technical characteristic Scheme, while should also cover in the case where not departing from the inventive concept, it is carried out by above-mentioned technical characteristic or its equivalent feature Any combination and the other technical solutions formed.Such as features described above has similar function with (but being not limited to) disclosed in the disclosure Can technical characteristic replaced mutually and the technical solution that is formed.

Claims

1. a kind of method for image recognition, comprising:

Obtain the image of object to be identified；

By described image input disaggregated model trained in advance, the feature of classification prediction result and object is obtained；

The confidence level of the classification prediction result is calculated based on the classification prediction result；

If the confidence level of the classification prediction result is less than or equal to the first believability threshold, the feature is inputted into building in advance Retrieval model, obtain retrieval prediction result；

The confidence level of the retrieval prediction result is calculated based on the retrieval prediction result；

If the confidence level of the retrieval prediction result is greater than the second believability threshold, the retrieval prediction result is exported.

2. according to the method described in claim 1, wherein, the method also includes:

If the confidence level of the classification prediction result is greater than first believability threshold, the classification prediction result is exported.

3. according to the method described in claim 1, wherein, the method also includes:

If the confidence level of the retrieval prediction result is less than or equal to the second believability threshold, it is based on the classification prediction result and institute State the confidence level that retrieval prediction result calculates the various prediction results of described image；

The highest predetermined number prediction result of the confidence level of various prediction results is exported as final prediction result.

4. according to the method described in claim 1, wherein, the image for obtaining object to be identified, comprising:

It will include that the image of object to be identified and background inputs the subject detection model trained in advance, obtain object to be identified Image.

5. according to the method described in claim 4, wherein, the disaggregated model that described image input is trained in advance, comprising:

By two disaggregated models of the image input training in advance of the object to be identified, judge whether the object is target class Not；

If target category, then the disaggregated model of the image for the target category for identification trained described image input in advance.

6. being used according to the method described in claim 1, wherein, the disaggregated model is Inception-ResNetv2 model The mode of classification uniform sampling carries out samples selection, and is added that label is smooth and mixed strategy, is decayed using cosine learning rate Strategy, training loss use cross entropy loss function.

7. according to the method described in claim 1, wherein, the retrieval model constructs as follows:

Extract the feature of each image in predetermined image library, wherein each image is corresponding with classification in described image library；

The feature of each image is subjected to dimension-reduction treatment；

To the feature construction index of each image after dimensionality reduction, wherein the index includes forward index and/or inverted index.

8. according to the method described in claim 4, wherein, the subject detection model uses Faster-RCNN.

9. according to the method described in claim 5, wherein, two disaggregated model uses ResNet-34 model.

10. a kind of device for image recognition, comprising:

Acquiring unit is configured to obtain the image of object to be identified；

Taxon is configured to inputting described image into disaggregated model trained in advance, obtains classification prediction result and object Feature；

First computing unit is configured to calculate the confidence level of the classification prediction result based on the classification prediction result；

Retrieval unit, if the confidence level for being configured to the classification prediction result is less than or equal to the first believability threshold, by institute It states feature and inputs the retrieval model constructed in advance, obtain retrieval prediction result；

Second computing unit is configured to calculate the confidence level of the retrieval prediction result based on the retrieval prediction result；

Output unit, if be configured to it is described retrieval prediction result confidence level be greater than the second believability threshold, output described in Retrieve prediction result.

11. device according to claim 10, wherein the output unit is further configured to:

12. device according to claim 10, wherein described device further includes integrated unit, is configured to:

13. device according to claim 10, wherein the acquiring unit is further configured to:

14. device according to claim 13, wherein the taxon is further configured to:

15. device according to claim 10, wherein the disaggregated model is Inception-ResNetv2 model, is adopted Samples selection is carried out with the mode of classification uniform sampling, and is added that label is smooth and mixed strategy, is declined using cosine learning rate Subtract strategy, training loss uses cross entropy loss function.

16. device according to claim 10, wherein described device further includes construction unit, is configured to:

The feature of each image is subjected to dimension-reduction treatment；

17. device according to claim 11, wherein the subject detection model uses Faster-RCNN.

18. device according to claim 14, wherein two disaggregated model uses ResNet-34 model.

19. a kind of electronic equipment for image recognition, comprising:

One or more processors；

Storage device is stored thereon with one or more programs,

When one or more of programs are executed by one or more of processors, so that one or more of processors are real The now method as described in any in claim 1-9.

20. a kind of computer-readable medium, is stored thereon with computer program, wherein real when described program is executed by processor The now method as described in any in claim 1-9.