CN102609715A

CN102609715A - Object type identification method combining plurality of interest point testers

Info

Publication number: CN102609715A
Application number: CN2012100045450A
Authority: CN
Inventors: 罗会兰; 井福荣; 张彩霞
Original assignee: Jiangxi University of Science and Technology
Current assignee: Jiangxi University of Science and Technology
Priority date: 2012-01-09
Filing date: 2012-01-09
Publication date: 2012-07-25
Anticipated expiration: 2032-01-09
Also published as: CN102609715B

Abstract

The invention belongs to the technical field of mode identification, computer vision and image understanding and discloses an object type identification method combining a plurality of interest point testers. The method disclosed by the invention comprises the following steps of: firstly, extracting an interest point containing various shapes, edge outline and gray information through different interest point testers, so as to form different expression vectors of an image. A visual dictionary set can be obtained based on different interest point sets, and each member utilizes one different image characteristic. A classifier set is obtained based on the generated visual dictionary set, so as to create an object type cognitive model and a model learning method to adapt to the selecting characteristics according to the current identification task. As shown in a test, the method can combine information detected by different interest point testers and capture different characteristics of the image so as to effectively improve the performance of the traditional object type identification method based on a single visual dictionary.

Description

A kind of object class recognition methods that combines a plurality of points of interest to detect son

Technical field

The invention belongs to pattern-recognition, computer vision, image understanding technical field, be specifically related to the recognition methods of a kind of object class.

Background technology

The identification of object class is a key issue in the computer vision field.The object class model change in must type of handling well with type between similar balance.The mankind can discern many object classes easily, but for computing machine and robot, this task is still extremely challenging.In object class aspect, the variation of illumination condition, geometry deformation blocks with ground unrest or the like and brings many challenges all for effectively study and sane identification.In addition, the object class identification very big difference between also will the interior different instances of type of overcoming.

One sub-picture comprises many information, and characterization one sub-picture how makes it to be used for effectively and efficiently identification.This problem is very difficult, and depends on identification mission.Bag-of-words aspect of model method, very popular recently, because this method is simple and effective.The basic thought of this method is the set of regarding image as sparse point of interest (region-of-interest, or be called salient region).It derives from the morpheme method in the text analyzing, and basic thought is the sparse set of regarding image as autonomous block, and some representative area pieces of sampling from image are every separately then and describe characteristic, uses the description spatial distributions to come presentation video.

Point of interest detects son can be divided into three types: based on profile, based on gray scale and based on parameter model.Many Computer Vision Task depend on low-level feature, and the result receives the influence that detects son of using to a great extent.In computer vision field, detect zone and reached maturation to a certain degree with one type of conversion unchangeability.These unchangeability method for detecting area are applied in the very different fields, comprise in the identification and object classification field based on model.The point of interest that is extracted by different detection possibly include the different information contents.The method that the invention provides a kind of novelty combines a plurality of detection to be used for classified image.Integrated approach provides a kind of effective fusion mode to handle the information that different point of interest comprises.This integrated framework has also mated human visual system's mechanism, and the multiple different clues of acceptance that can walk abreast are discerned different object classes.

Current common recognition to object class Study of recognition is: the first, and the shape of object is complicated with outward appearance and similar object differences is big, so model should be (comprises a lot of parameters, use to mix and describe) of enriching; The second, the outward appearance of object should be a height change in type, so model should be (to allow the variation of parameter) flexibly; The 3rd, for object in type of processing changes and blocks, model should be made up of characteristic, and part is formed in other words, and these characteristics needn't detect in all instances, and these local mutual alignments have constituted further model information; The 4th, it is difficult using priori to come the modelling class, preferably learning model from training sample; The 5th, must consider counting yield.

So utilizing the method for machine learning to carry out object class Study of recognition is current a kind of research tendency.Early stage to set up the method limitation of a fixed model to the manual work of certain objects class very big, possibly not be generalized under multiclass object and the different application scene.But it is generally more intense to the study supervision degree of object class identification at present; The requirement that has is cut apart image in advance; The requirement that has is to the rectangle location of target object; The requirement that has is to image type of giving label, and in addition the most weak supervision sample also can require target object in the sample to occupy the center of sample with absolute predominance, and all samples will have same size.The supervision sample to obtain cost very big, this just means and can not obtain a lot of samples so, sample that also can not all types can both get access to, this has just limited the performance learnt and the width of study.

Human vision system can walk abreast and utilize multiple information to come recognition object, and can both learn a model for every kind of unchangeability, and this thought of integrated study technology just.Non-supervised integrated study technology cluster integrated technology has in other words obtained certain development in recent years, for the supervision degree that reduces the identification of object class with utilize the integrated study Study on Technology to provide the foundation.Exist many point of interest to detect son at present, but which kind of point of interest detect son be more suitable in current task in other words performance how to be difficult to make correct answer.The present invention proposes to use different detection to obtain the different clues of image.Detect on the sub detected point of interest in difference, set up the different visual dictionary.Based on the different visual dictionary, same training image energy collecting quantizes to obtain different trained vector collection, and they have caught the information of image different aspect, on different trained vector collection, can learn to obtain different member classifying devices.When using the image that these sorters of having learnt different aspect object model characteristic classify new, the different members sorter provides their answer, integrated they can obtain the lifting of performance.

The main contribution of this invention has been to propose a kind of method of carrying out the identification of object class based on non-supervised integrated study technology.The present invention can effectively reduce the supervision degree of object class identification, fully utilizes multiple effective information, and the collateral learning object model effectively improves object class identification efficiency and accuracy.

Summary of the invention

Too complicated in order to solve the model that exists in the identification of traditional object class, the supervision degree is crossed the problem of strong and poor robustness, the invention provides a kind ofly to utilize dictionary collective to walk abreast to utilize the method for the multiple object identification information class that exists in the image.

The present invention is a kind of vision dictionary method.It comprises extracts point of interest (or being called marking area) from image, describe point of interest and mark the point of interest vector after describing with the vision dictionary of learning with local description.Just as in text classification, the number of times statistics that each label occurs generates an overall histogram and is used for the presentation video content.Histogram is input to a sorter and comes the object classification in the recognition image.The vision dictionary obtains by the point of interest of training data being described the vector set cluster.Image classification is a difficulty especially for the conventional machines learning algorithm, and main cause is that the quantity of information that comprises of image is too big, and dimension is too high.The too high conventional machines learning method that causes of dimension obtains very unsettled model, and the generalization ability of model is very poor.The present invention is used for image classification with the integrated study technology.Different points of interest detects son and is used for forming vision dictionary collective.Can obtain the different quantization vector collection of same training dataset based on vision dictionary collective.Based on the quantification training set that has comprised the different aspect characteristic, can train different sorters, thereby obtain a sorter collective, every kind of sorter utilizes different information to set up object model.Can obtain unexpected effect when discerning new image with the sorter collective of learning.Integrated approach improves existing learning algorithm through the prediction that combines a plurality of models.A good collective should be that the otherness between the member is bigger in the collective.If the member in the collective is the same, that integrated they can not bring the lifting of performance.So the otherness between the member is a key factor of the extensive error of decision integrated study.The present invention proposes a kind of technology that generates otherness vision dictionary collective and generate respective classified device collective based on vision dictionary collective.

Content of the present invention is set forth as follows:

1, utilizes different point of interest detection to generate to include and enrich shape, the vision dictionary collective of cincture exterior feature and half-tone information

The structure of vision dictionary collective is non-supervised, and the class label of sample only just can be used when training classifier.Receive the inspiration of human perception, motivation of the present invention is parallelly to utilize multiple available clue to come classified image.Come recognition object just as the mankind often use different information, the present invention uses different points of interest to detect son and extracts the pictures different information content.Utilize different point of interest detection to extract to include and enrich shape, the point of interest of cincture exterior feature and half-tone information, the difference that forms image is expressed vector.In different point of interest set, can obtain a vision dictionary collective, each member utilizes a kind of pictures different characteristic.In order to increase the otherness that generates collective; When forming member's vision dictionary; Concentrate from training image at random earlier and select a part of image, after using a kind of different points of interest detection to obtain all points of interest on these images, select a part to form the vision dictionary at random.In vision dictionary collective, can obtain the different quantization vectors of same image.

The process prescription of this method is following:

1) adopts different points of interest to detect son and extract point of interest;

2) use clustering algorithm that the point of interest cluster after describing is obtained a vision dictionary;

3) repeating step 1 is to step 2, up to the vision dictionary collective that generates preset size.

Experimental result shows that the method can merge the different sub detected interest point informations that detect, and catches the characteristic and the information of image different aspect.Use vision wordbook body surface to reach image and the better recognition performance is arranged than traditional graphical representation method based on single vision dictionary.

2, the different points of interest of fusion detect sub detected different aspect characteristics of image and generate sorter collective

After utilizing different points of interest to detect son generation dictionary collective, can obtain the different quantized training dataset based on each member's dictionary.Merge the different sorter of training on the quantification training dataset of different information, thereby can obtain a sorter collective.Each member classifying device is set up model according to the object that is characterized as of different aspect.Through making up difference vision dictionary collective, the sorter collective that can obtain having high diversity.Collective with high diversity can effectively reduce the needed supervision degree of accurate model of setting up.The parallel sub detected different aspect characteristics of image of different detections that utilizes of the present invention comes classified image, the characteristics of using the different visual dictionary to come the presentation video different aspect.Obtain the different quantization vector collection of training dataset based on resulting vision dictionary collective.Different quantization vector collection study based on same training dataset obtain sorter collective, and the different models in the collective can be caught different character.Concrete step is following:

1) generate vision dictionary collective, each member's vision dictionary merges the different sub detected different aspect characteristics of image that detect;

2) based on member's vision dictionary, training data is quantized;

3) sorter of study on the training dataset after the quantification;

4) repeating step 2 generates the sorter collective of preset size to step 3.

3, integrated vision dictionary collective and corresponding sorter collective recognition object class

Member's vision dictionary is independently with corresponding member classifying device, can parallel training.After forming based on the sorter collective of vision dictionary collective, when classifying a new test pattern, quantification and the model that application is acquired of extraction and description, image that equally also comprises point of interest is to the process of quantization vector.The classification results of integrated classifier collective is exported integrated result and is used for classified image.Concrete step is following:

1) utilizes the different sons that detect that new images is detected point of interest, and utilize descriptor to describe these points of interest;

2) based on corresponding member's vision dictionary, new images is quantized;

3) use corresponding member classifying device classification new images, obtain classification results;

4) repeating step 2 has obtained the classification results of oneself to step 3 up to each member classifying device;

5) utilize the classification results of the integrated member classifying device of integrated technology to obtain final object class label.

To sum up the inventive method is at first used different points of interest to detect son and is detected the point of interest that comprises training image different aspect information, and cluster obtains a vision dictionary that can characterize a kind of image information on the interest point set after the description.Based on this vision dictionary former training plan image set is quantized, thereby obtain the different quantized vector set, training obtains coming according to customizing messages the model of minute type objects on this vector set.This process is parallel carries out, and each processor uses different points of interest to detect son and catches the model that pictures different information is learnt object, sees shown in Figure 1.After extracting the point of interest of new images; Walk abreast and use the member in the vision dictionary collective respectively image to be quantized; Use respective classified device member to discern then, the recognition result according to all member classifying devices carries out the integrated final recognition result that provides at last, sees shown in Figure 2.

The present invention comes recognition object through generating the vision dictionary collective that can express the object multi-aspect information.With respect to the object class recognition methods based on single vision dictionary, the method has strong robustness, puts into practice advantages such as simple and average effective.This method can detect sub detected interest point information with difference and merge in each vision dictionary; Catch the characteristic and the information of image different aspect; Thereby sorter collective of parallel generation; Reduced the complexity of finding the solution, so this invention also can effectively improve the consumption of counting yield, minimizing computational resource, recognition object fast and accurately.

The average behavior that the present invention has on the different field data set is better, the advantage of strong robustness, and model is simple, is highly suitable for general operation person.It does not need the adjustment of complex parameters, and the supervision degree is low, and to training data require low.Utilize the intrinsic concurrency of integrated study, can on a plurality of processors, utilize a small amount of training data collateral learning, so efficient of the present invention is also higher relatively.

Description of drawings

Fig. 1 is an exemplary plot of the present invention.

Fig. 2 is the exemplary plot of new images being classified with vision dictionary collective of learning and sorter collective.

Embodiment

The preferred specific embodiment of the present invention:

Change the image size, make every sub-picture approximately comprise 40,000 pixels (aspect ratio reservation).Because the SIFT descriptor is the most popular and the most effective descriptor, and most existing correlation technique all uses 128 dimension SIFT vectors to describe point of interest.So preferred specific embodiment also uses it to describe point of interest.Select new training subclass of image formation of 60% at every turn.From every sub-picture, select 60 points of interest at random, come constructor's vision dictionary with k-means.Because the intrinsic randomness of k-means algorithm, so when forming different member's dictionaries, be equivalent to use different cluster devices.In the great majority research relevant with " bag-of-words " model, the size of vision dictionary is between 100 to 1000, so this parameter is arranged to intermediate value 500.Linear SVM (Support Vector Machine) is at acquistion to a sorter of going to school based on the quantization vector collection of each member's dictionary.Size of 9 formation of this process iteration is 9 sorter collective.When the new image of test, sorter collective is used for classified image, and consistance function C SPA is used for the integrated result of collective.CSPA calculates each image to being in the probability in same type based on sorter collective, thereby sets up a similarity matrix.

In order to detect different points of interest, 9 kinds of following different points of interest detect the different information contents that son is used for extracting image, are 9 collective so can obtain size:

1) the Harris point of interest detects son;

2) the SUSAN point of interest detects son;

3) the LOG point of interest detects son;

4) Harris Laplace point of interest detects son;

5) the Gilles point of interest detects son;

6) the SIFT point of interest detects son parameter PeakThresh=5 is set;

7) the SIFT point of interest detects son parameter PeakThresh=0 is set;

8) selecting radius at random is 100 of the border circular areas of 10 to 30 pixels;

9) selecting radius at random is 500 of the border circular areas of 10 to 30 pixels.

Experimental result shows that the preferred specific embodiment of the present invention has more performance than traditional recognition methods based on single vision dictionary, even has surpassed some performances through the complex model of meticulous parameter adjustments.

Claims

1. object class recognition methods that combines a plurality of points of interest to detect son is characterized in that utilizing different point of interest detection to extract to include and enriches shape, the vision dictionary collective of cincture exterior feature and half-tone information, and concrete steps are following:

2. method according to claim 1 is characterized in that said point of interest detects son detects sub-detected image with following 9 kinds of different points of interest point of interest: