CN102609715B

CN102609715B - Object type identification method combining plurality of interest point testers

Info

Publication number: CN102609715B
Application number: CN201210004545.0A
Authority: CN
Inventors: 罗会兰; 井福荣; 张彩霞
Original assignee: Jiangxi University of Science and Technology
Current assignee: Jiangxi University of Science and Technology
Priority date: 2012-01-09
Filing date: 2012-01-09
Publication date: 2015-04-08
Anticipated expiration: 2032-01-09
Also published as: CN102609715A

Abstract

The invention belongs to the technical field of mode identification, computer vision and image understanding and discloses an object type identification method combining a plurality of interest point testers. The method disclosed by the invention comprises the following steps of: firstly, extracting an interest point containing various shapes, edge outline and gray information through different interest point testers, so as to form different expression vectors of an image. A visual dictionary set can be obtained based on different interest point sets, and each member utilizes one different image characteristic. A classifier set is obtained based on the generated visual dictionary set, so as to create an object type cognitive model and a model learning method to adapt to the selecting characteristics according to the current identification task. As shown in a test, the method can combine information detected by different interest point testers and capture different characteristics of the image so as to effectively improve the performance of the traditional object type identification method based on a single visual dictionary.

Description

A kind of object type identification method detecting son in conjunction with multiple point of interest

Technical field

The invention belongs to pattern-recognition, computer vision, image understanding technical field, be specifically related to a kind of object type identification method.

Background technology

Object type identification is a key issue in computer vision field.Object type model must handle the balance that in class, change is similar with between class well.The mankind can identify many object type easily, but for computing machine and robot, this task is still extremely challenging.In object type aspect, the change of illumination condition, geometry deformation, blocks with ground unrest etc. all to effectively study and sane identification bring many challenges.In addition, object type identification also will overcome the huge difference in class between different instances.

One sub-picture comprises many information, how characterization one sub-picture, makes it effective and is used for efficiently identifying.This problem is very difficult, and depends on identification mission.Bag-of-words aspect of model method, recently very popular, because this method is simple and effective.The basic thought of this method is set image being regarded as sparse point of interest (region-of-interest, or be called salient region).It derives from the morpheme method in text analyzing, and basic thought is sparse set image being regarded as autonomous block, some representative area blocks of sampling from image, is then every block Expressive Features separately, uses the distribution describing space to represent image.

Point of interest detects son can be divided into three classes: based on profile, based on gray scale and based on parameter model.Many Computer Vision Task depend on low-level feature, and result is to a great extent by the impact that institute uses detection sub.In computer vision field, detect that the region with a class Inalterability of displacement reaches maturation to a certain degree.These unchangeability method for detecting area are applied in very different fields, comprise based in the identification of model and object classification field.The point of interest extracted by different detections may include the different information contents.The invention provides a kind of method of novelty to sub for classified image in conjunction with multiple detection.Integrated approach provides the information that a kind of effective amalgamation mode comprises to process different point of interest.This integration framework also have matched the mechanism of human visual system, can walk abreast and accept multiple different clue to identify different object type.

The current common recognition to object type Study of recognition is: the first, the shape of object and outward appearance complicated and between similar object difference large, so model should be abundant (comprises a lot of parameters, use mix description); The second, in class, the outward appearance of object should be height change, so model should be (allow the change of parameter) flexibly; 3rd, change to process object in class and block, model should be made up of feature, in other words part composition, and these features need not detect in all examples, and the mutual alignment of these local constitutes further model information; 4th, it is difficult for using priori to carry out modelling class, preferably from training sample learning model; 5th, must counting yield be considered.

So utilize the method for machine learning to be current a kind of research tendency to carry out object type Study of recognition.Early stage manually sets up the method limitation of a fixed model very greatly, under may not being generalized to multiclass object and different application scene for certain objects class.But it is generally stronger to the study supervision degree of object type identification at present, image is split by some requirements in advance, some requirements are located the rectangle of target object, some requirements give class label to image, in addition the most weak supervision sample also can require that in sample, target object occupies the center of sample with absolute predominance, and all samples will have same size.The acquisition cost of supervision sample is very large, and so this just means and can not obtain a lot of samples, also the sample of all classes can not can get, which limits the performance of study and the width of study.

The vision system of the mankind can walk abreast and utilize much information to carry out recognition object, and can learn a model for often kind of unchangeability, and the thought of this integrated study technology just.Non-supervisory formula integrated study technology in other words clustering ensemble technology obtains certain development in recent years, for reducing the supervision degree of object type identification and utilizing the research of integrated study technology to provide the foundation.There is many point of interest at present and detect son, but which kind of point of interest detect son be more suitable for current task in other words performance how to be difficult to make correct answer.The present invention proposes to use different detection to obtain the different clues of image.Detect on the sub point of interest detected in difference, set up different visual dictionary.Based on different visual dictionary, same training image energy collecting quantizes to obtain different trained vector collection, and they capture the information of image different aspect, based on different trained vector collection, can learn to obtain different member classifiers.When using these sorters that have learned different aspect object model feature to the image of classifying new, different members sorter provides their answer, integrated they can obtain the lifting of performance.

The main contributions of this invention is that proposing one carries out object type knowledge method for distinguishing based on non-supervisory formula integrated study technology.The present invention effectively can reduce the supervision degree of object type identification, fully utilizes multiple effective information, collateral learning object model, effectively improves efficiency and the accuracy of object type identification.

Summary of the invention

Too complicated in order to solve the model existed in traditional object type identification, supervision degree crosses problem that is strong and poor robustness, the invention provides a kind ofly to utilize dictionary collective to walk abreast the method utilizing the much information recognition object class existed in image.

The present invention is a kind of visual dictionary method.It comprises extracts point of interest (or being called marking area) from image, describes point of interest and the point of interest vector after describing with the visual dictionary mark learning to arrive with local description.Just as in text classification, number of times statistics generation color histogram that each label occurs is used for representing picture material.Histogram is input to the object classification that a sorter comes in recognition image.Visual dictionary obtains by describing vector set cluster to the point of interest of training data.Images Classification is difficult especially for conventional machines learning algorithm, and main cause is that the quantity of information that image comprises is too large, and dimension is too high.The too high conventional machines learning method that causes of dimension obtains very unstable model, and the generalization ability of model is very poor.Integrated study technology is used for Images Classification by the present invention.Different points of interest detects son and is used for forming visual dictionary collective.View-based access control model dictionary collective can obtain the different quantization vector collection of same training dataset.Based on the quantification training set containing different aspect feature, can train different sorters, thus obtain a sorter collective, often kind of sorter utilizes different information to set up object model.Beyond thought good effect can be obtained when identifying new image with the sorter collective learnt.Integrated approach is by improving existing learning algorithm in conjunction with the prediction of multiple model.Good collective should be that the otherness in collective between member is larger.If the member in collective is the same, that integrated they can not bring the lifting of performance.So the otherness between member is a key factor of the extensive error determining integrated study.The present invention proposes and a kind ofly generate the technology that otherness visual dictionary collective and view-based access control model dictionary collective generate corresponding sorter collective.

Content of the present invention is described below:

1, utilize different point of interest detection generations to include and enrich shape, the visual dictionary collective of cincture exterior feature and half-tone information

The structure of visual dictionary collective is non-supervised, and the class label of sample only just can be used when training classifier.By the inspiration of human perception, motivation of the present invention is that parallel utilization multiplely can carry out classified image by clue.Just as the mankind often use different information to carry out recognition object, the present invention uses different points of interest to detect son and extracts different image information content.Utilize different point of interest to detect son to extract to include and enrich shape, the point of interest of the wide and half-tone information of cincture, forms the difference expression vector of image.Based in different interest point set, can obtain a visual dictionary collective, each member utilizes a kind of different characteristics of image.In order to increase the otherness generating collective, when forming member's visual dictionary, first random concentrating from training image selects a part of image, and after using a kind of different point of interest detection to obtain all points of interest on these images, a Stochastic choice part forms visual dictionary.In view-based access control model dictionary collective, the different quantization vectors of same image can be obtained.

The process prescription of this method is as follows:

1) adopt different points of interest to detect son and extract point of interest;

2) clustering algorithm is used to obtain a visual dictionary to the point of interest cluster after description;

3) step 1 is repeated to step 2, until generate the visual dictionary collective presetting size.

Experimental result shows that the method can merge the different interest point information detecting son and detect, catches the characteristic sum information of image different aspect.The image representing method based on single visual dictionary using visual dictionary collective to express image ratio traditional has better recognition performance.

2, merge different point of interest and detect the sub different aspect characteristics of image generation sorter collective detected

After utilizing different points of interest to detect son generation dictionary collective, different quantification training datasets can be obtained based on each member's dictionary.The quantification training dataset merging different information trains different sorters, thus a sorter collective can be obtained.Each member classifiers is object Modling model according to the feature of different aspect.By building difference visual dictionary collective, the sorter collective with high diversity can be obtained.The collective with high diversity effectively can reduce the supervision degree set up required for an accurate model.The present invention walks abreast and utilizes the sub different aspect characteristics of image detected of different detection to carry out classified image, uses different visual dictionary to represent the feature of image different aspect.The different quantization vector collection of training dataset are obtained based on obtained visual dictionary collective.Different quantization vector collection study based on same training dataset obtain sorter collective, and the different models in collective can catch different features.Concrete step is as follows:

1) generate visual dictionary collective, each member vision dictionary merges the different different aspect characteristics of image detecting son and detect;

2) based on a member vision dictionary, training data is quantized;

3) a study sorter on training dataset after quantization;

4) repeat step 2 to step 3, generate the sorter collective presetting size.

3, integrated vision dictionary collective and corresponding sorter collective recognition object class

Member vision dictionary and corresponding member classifiers are independently, can parallel training.After the sorter collective of view-based access control model dictionary collective is formed, classify a new test pattern time, equally also comprise model that the extraction of point of interest and description, the quantification of image and application the acquire process to quantization vector.The classification results of integrated classifier collective, exports integrated result for classified image.Concrete step is as follows:

1) utilize the different son that detects to detect point of interest to new images, and utilize descriptor to describe these points of interest;

2) based on a corresponding member vision dictionary, new images is quantized;

3) use corresponding member classifiers to classify new images, obtain classification results;

4) step 2 is repeated to step 3, until each member classifiers obtains the classification results of oneself;

5) classification results of the integrated member classifiers of integrated technology is utilized to obtain final object type label.

To sum up first the inventive method uses different points of interest to detect son and the point of interest comprising training image different aspect information detected, and on the interest point set after description, cluster obtains the visual dictionary that can characterize a kind of image information.Quantize former training plan image set based on this visual dictionary, thus obtain different quantization vector collection, on this vector set, training obtains the model dividing type objects according to customizing messages.This concurrent process carries out, and each processor uses different point of interest to detect son and catches different image informations to learn the model of object, as shown in Figure 1.After extracting the point of interest of new images, member in parallel use visual dictionary collective quantizes image respectively, then use corresponding sorter member to identify, finally carry out integratedly providing final recognition result according to the recognition result of all member classifiers, as shown in Figure 2.

The present invention carrys out recognition object by generating the visual dictionary collective expressing object multi-aspect information.Relative to the object type identification method based on single visual dictionary, the method has strong robustness, it is simple to put into practice and the average advantage such as effective.Difference can be detected the sub interest point information detected and merge in each visual dictionary by this method, catch the characteristic sum information of image different aspect, thus parallel generation sorter collective, reduce the complexity solved, therefore this invention also effectively can improve counting yield, reduce the consumption of computational resource, recognition object fast and accurately.

The average behavior that the present invention has on different field data set is better, the advantage of strong robustness, and model is simple, is highly suitable for general operation person.It does not need the adjustment of complex parameters, and supervision degree is low, and low to the requirement of training data.Utilize the concurrency that integrated study is intrinsic, a small amount of training data collateral learning can be utilized on multiple processor, so efficiency of the present invention is also relatively high.

Accompanying drawing explanation

Fig. 1 is exemplary plot of the present invention.

Fig. 2 is the exemplary plot of classifying to new images with the visual dictionary collective learning to arrive and sorter collective.

Embodiment

The preferred specific embodiment of the present invention:

Change image size, make every sub-picture approximately comprise 40,000 pixel (aspect ratio reservation).Because SIFT descriptor is the most popular and the most effective descriptor, and most existing correlation technique all uses 128 dimension SIFT vectors to describe point of interest.So preferred specific embodiment also uses it to describe point of interest.The image at every turn selecting 60% forms a new training subset.From every sub-picture, Stochastic choice 60 points of interest, carry out constructor's visual dictionary with k-means.Because the intrinsic randomness of k-means algorithm, so when forming different member's dictionaries, be equivalent to employ different cluster devices.In the research that great majority are relevant with " bag-of-words " model, the size of visual dictionary is between 100 to 1000, so this optimum configurations becomes intermediate value 500.Linear SVM (Support Vector Machine) goes to school acquistion to a sorter at the quantization vector collection based on each member's dictionary.This process iterates forms the sorter collective that a size is 9 for 9 times.When testing new image, sorter collective is used for classified image, and compatibility function CSPA is used for integrated collective result.CSPA calculates each image to the probability be in same class based on sorter collective, thus sets up a similarity matrix.

In order to detect different point of interest, below 9 kinds different points of interest detect the different information contents that son is used for extracting image, so can obtain the collective that size is 9:

1) Harris point of interest detects son;

2) SUSAN point of interest detects son;

3) LOG point of interest detects son;

4) Harris Laplace point of interest detects son;

5) Gilles point of interest detects son;

6) SIFT point of interest detects sub-parameters PeakThresh=5;

7) SIFT point of interest detects sub-parameters PeakThresh=0;

8) Stochastic choice radius is the border circular areas 100 of 10 to 30 pixels;

9) Stochastic choice radius is the border circular areas 500 of 10 to 30 pixels.

Experimental result shows, the preferred specific embodiment of the present invention has better performance than tradition based on the recognition methods of single visual dictionary, has even exceeded some performances through the complex model of meticulous parameter adjustment.

Claims

1. detect an object type identification method for son in conjunction with multiple point of interest, it is characterized in that utilizing different points of interest to detect son and extract to include and enrich shape, the visual dictionary collective of the wide and half-tone information of cincture, concrete steps are as follows:

1) adopt different points of interest to detect son and extract point of interest, it is 9 kinds that described different point of interest detects son, be respectively: Harris point of interest detects son, SUSAN point of interest detects son, LOG point of interest detects son, Harris Laplace point of interest detects son, Gilles point of interest detects son, SIFT point of interest detects sub-parameters PeakThresh=5, SIFT point of interest detects sub-parameters PeakThresh=0, Stochastic choice radius is the border circular areas 100 of 10 to 30 pixels, and Stochastic choice radius is the border circular areas 500 of 10 to 30 pixels;

2) use clustering algorithm to carry out clustering learning respectively to 9 kinds of different points of interest after description and obtain visual dictionary collective;

3) based on the training classifier respectively of each member vision dictionary in described visual dictionary collective, study obtains sorter collective, concrete steps comprise: a) based on a member vision dictionary, training data is quantized, b) a study sorter on training dataset after quantization, repeat step a) to step b), generate the sorter collective presetting size;

4) integrated vision dictionary collective and corresponding sorter collective recognition object class, concrete steps comprise: a) utilize different points of interest to detect son and detect point of interest to new images, and utilize descriptor to describe these points of interest, b) based on a corresponding member vision dictionary, new images is quantized, c) corresponding member classifiers is used to classify new images, obtain classification results, d) step b) is repeated to step c), until each member classifiers obtains the classification results of oneself, e) classification results of the integrated member classifiers of integrated technology is utilized to obtain final object type label.