CN103366181A

CN103366181A - Method and device for identifying scene integrated by multi-feature vision codebook

Info

Publication number: CN103366181A
Application number: CN2013102689531A
Authority: CN
Inventors: 覃剑钊; 阎镜予
Original assignee: China Security and Surveillance Technology PRC Inc
Current assignee: China Security and Surveillance Technology PRC Inc
Priority date: 2013-06-28
Filing date: 2013-06-28
Publication date: 2013-10-23

Abstract

The invention provides a method and a device for identifying a scene integrated by a multi-feature vision codebook, which belong to the technologies of image processing and mode identifying. The method comprises the following steps of carrying out multi-feature integration on a local area of a scene image through a local classifier, and obtaining the expression of the multi-feature vision codebook of the local area of the scene image; and carrying out overall integration and classification on the expression of the multi-feature vision codebook according to overall integration parameters and classification parameters which are obtained through pre-training. Compared with a mode of generating the expression of a single-feature vision codebook or the expression of a multi-feature signal vision codebook by using a single-feature estimation probability and then carrying out overall feature integration, the method disclosed by the invention has the advantage that a more accurate automatic scene classification result can be obtained.

Description

The scene Recognition method and apparatus that many features vision code book merges

Technical field

The present invention relates to video image and process and mode identification technology, relate in particular to the scene Recognition method and apparatus that a kind of many features vision code book merges.

Background technology

Image-based scene Recognition technology can with camera or camera acquisition to view data be identified as automatically different scene classifications, for example: sandy beach, forest, highway, street, office, bedroom etc.This technology can be applicable to intelligent automobile, intelligent robot self-navigation field, and the image-based scene Recognition can be other Computer Vision Task simultaneously, for example: object identification, object are found, the behavior classification, image retrieval, video monitoring provide necessary prior imformation.

In recent years, the method based on local feature is widely used in the image-based scene Recognition.These class methods are to blocking, and illumination variation and slight geometric deformation are insensitive, compare with the method based on global characteristics, have stronger robustness.Method based on global characteristics, scene image is treated as an integral body, feature is extracted from entire image, for example: and the color histogram of entire image or the textural characteristics of entire image, then use sorter these features of extracting from entire image are trained and to classify.And based on the method for local feature, extract feature from the regional area of scene image, then component visual or the visual theme good according to training in advance is described as the width of cloth scene image probability distribution or the set of these component visual or visual theme.In order further to improve the performance of scene Recognition or object identification system, multiple many Feature fusions based on the overall situation (for example: Multiple Kernel Learning, the linear enhancing) are proposed for the vision code book of a plurality of different characteristics that merge scene image and express.These generate corresponding single features vision code book for various single features at first respectively and express based on the Feature fusion of the overall situation, then use the methods such as Multiple Kernel Learning or linear enhancing to train fusion parameters and sorting parameter to be used for scene Recognition.But this Feature fusion based on the overall situation, the mistake during single features vision code book that can't remedial frames is expressed, these mistakes can be passed to during global characteristics merges.

Summary of the invention

In view of this, the technical problem to be solved in the present invention provides the scene Recognition method and apparatus that a kind of many features vision code book merges, to carry out many Fusion Features at the scene image local, thereby correct the mistake in the expression of single features vision code book, obtain more accurately single features vision code book expression and carry out overall situation fusion, improve the accuracy of scene Recognition.

It is as follows that the present invention solves the problems of the technologies described above the technical scheme that adopts:

According to an aspect of the present invention, the scene Recognition method that a kind of many features vision code book that provides merges comprises:

By local classifiers the scene image regional area is carried out many Fusion Features, the many features vision code book that obtains the scene image regional area is expressed;

The overall fusion parameters that obtains according to training in advance and sorting parameter are expressed many features vision code book and are carried out the overall situation and merge and classify.

Preferably, by local classifiers image local area is carried out many Fusion Features, the many features vision code book expression that obtains image local area specifically comprises:

Obtain uniformly topography overlapped under the multiple yardstick from scene image;

From each topography, extract various features;

The many features vision code book that obtains by training in advance carries out Fusion Features to the topography of scene image, and the vision code book of generating scene image under various different characteristics expressed.

Preferably, the many features vision code book that obtains by training in advance carries out Fusion Features to the topography of scene image, and the vision code book of generating scene image under various different characteristics expressed and specifically comprised:

Every kind of feature in the localized region uses first the simple classification device to choose candidate's vision word, then uses complex classifier to calculate the probability that local features belongs to candidate's vision word;

Express according to the many features vision code book behind the probability generation Local Feature Fusion of every kind of feature.

Preferably, the overall fusion parameters that obtains according to training in advance and sorting parameter are expressed many features vision code book and are carried out overall situation fusion and classification comprises:

Calculate the posterior probability that scene image belongs to different scene classifications, select the maximum corresponding scene classification of posterior probability as classification results; Perhaps

Calculate scene image in an interfacial side as classification results.

Preferably, various features comprises: gradient orientation histogram feature, structure partial binary pattern feature, color characteristic or structural color feature.

Preferably, comprise that also training sample image obtains the step of many features vision code book of local classifiers, specifically comprises before the method:

According to the sample image generating training data collection that manually carries out the classification demarcation;

The sample image of concentrating from training data obtains the fractional sample image that overlaps each other under the multiple yardstick uniformly;

From each fractional sample image, extract various features;

The various fractional sample characteristics of image that belong to different scene classifications are carried out respectively cluster, generate a series of vision words;

Different set put in the vision word of different characteristic generate the corresponding vision code book of each feature.

Preferably, different set put in the vision word of different characteristic generate after the corresponding vision code book of each feature, also comprise obtaining overall fusion parameters and sorting parameter step, be specially:

Regional area to sample image in the training set carries out Fusion Features, generates the expression of sample image on different characteristic vision code book;

Many Fusion Features of the training overall situation, and store overall fusion parameters and sorting parameter.

Preferably, many Fusion Features of the training overall situation, and store overall fusion parameters and sorting parameter specifically comprises: after the proper vector series connection with many features vision code book, use classifier calculated fusion parameters and sorting parameter;

Or calculate respectively each visual code eigen kernel of vector matrix, calculate weighting parameters and the sorting parameter of each nuclear matrix by Multiple Kernel Learning;

Or respectively each visual code eigen vector is trained independently sorter, learn the weighting parameters of each sorter.

According to another aspect of the present invention, the scene Recognition device that a kind of many features vision code book that provides merges comprises:

Local Fusion Module is used for by local classifiers the scene image regional area being carried out many Fusion Features, and the many features vision code book that obtains the scene image regional area is expressed;

Overall situation Fusion Module is used for the overall fusion parameters that obtains according to training in advance and sorting parameter and many features vision code book is expressed carries out the overall situation and merge and classify.

Preferably, local Fusion Module comprises:

Topography's acquiring unit is used for obtaining uniformly topography overlapped under the multiple yardstick from scene image;

Feature extraction unit is used for extracting various features from each topography;

The vision code book is expressed generation unit, is used for many features vision code book of obtaining by training in advance the topography of scene image is carried out Fusion Features, and the vision code book of generating scene image under various different characteristics expressed.

Preferably, vision code book expression generation unit comprises:

The probability calculation subelement is used for every kind of feature of localized region, uses first the simple classification device to choose candidate's vision word, then uses complex classifier to calculate the probability that local features belongs to candidate's vision word;

The vision code book is expressed computation subunit, is used for expressing according to the many features vision code book behind the probability generation Local Feature Fusion of every kind of feature.

Preferably, this device also comprises training module, training module is used for study and manually carries out the sample image that classification is demarcated, and obtains many features vision code book by sample image being carried out the part fusion, obtains overall fusion parameters and sorting parameter by sample image being carried out overall situation fusion.

The method and apparatus of the embodiment of the invention, by extract many features at image local area, and the sorter of training regional area estimates that topography belongs to the probability of candidate's vision word, generate a plurality of feature vision code books expression and carry out again overall amalgamation judging, with use single features estimated probability, the single vision code book expression of the vision code book expression of generation single features or a plurality of features is carried out the global characteristics fusion again and is compared, can correct the mistake that causes owing to the characteristic information quantity not sufficient, thereby generate more accurately vision code book expression, the global characteristics amalgamation judging is passed through in these more accurately many features vision code book expression again, has improved the accuracy of final scene Recognition.

Description of drawings

The scene Recognition method flow diagram that a kind of many features vision code book that Fig. 1 provides for the embodiment of the invention merges;

The method flow diagram of the many Fusion Features of a kind of regional area that Fig. 2 provides for the preferred embodiment of the present invention;

A kind of multiple dimensioned lower topography division example of obtaining that Fig. 3 provides for the preferred embodiment of the present invention;

A kind of method flow diagram of training many features code book that Fig. 4 provides for the embodiment of the invention;

The method flow diagram of another kind of local many Fusion Features that Fig. 5 provides for the preferred embodiment of the present invention;

The scene Recognition apparatus module structural drawing that a kind of many features vision code book that Fig. 6 provides for the embodiment of the invention merges.

Embodiment

In order to make technical matters to be solved by this invention, technical scheme and beneficial effect clearer, clear, below in conjunction with drawings and Examples, the present invention is further elaborated.Should be appreciated that specific embodiment described herein only in order to explain the present invention, is not intended to limit the present invention.

As shown in Figure 1, the scene Recognition method that merges of a kind of many features vision code book of providing of the embodiment of the invention comprises:

S102, by local classifiers the scene image regional area is carried out many Fusion Features, the many features vision code book that obtains the scene image regional area is expressed;

See also Fig. 2, this step can further comprise:

S1021, obtain uniformly topography overlapped under the multiple yardstick from scene image;

Particularly, see also the example that a kind of multiple dimensioned lower topography that Fig. 3 provides divides.On first order yardstick, whole image is made as a whole extraction feature; On the yardstick of the second level, the length and width of topography are half of whole image length and width, have half overlapping between two contiguous topographies; On third level yardstick, the length and width of topography are half of second level topography; By that analogy.Only enumerated a kind of method of obtaining multiple dimensioned topography herein, in the situation of technical resource abundance, can be in the poor extraction of thinner yardstick topography to obtain better performance.

S1022, from each topography, extract various features;

Particularly, above-mentioned various features includes but not limited to the HOG(gradient orientation histogram) feature, structure LBP(local binary patterns) feature, color characteristic or structural color feature.

Wherein, the leaching process of HOG feature mainly comprises: at first topography is divided into some equal portions, then in each equal portions, calculate gradient direction and the intensity of each picture point, then calculate the gradient orientation histogram of each equal portions, the histogram of gradients of each equal portions is together in series obtains the HOG feature at last.

Wherein, the characteristic extraction procedure of structure LBP feature mainly comprises: at first topography is divided into some equal portions, then from each equal portions, extract the LBP feature (magnitude relationship that namely compares each picture point and neighborhood each point, generate binary expression, then to the binary expression statistic histogram of the point in this zone), at last the series connection of the LBP feature of each equal portions is obtained structure LBP feature.

Wherein, the leaching process of structural color feature mainly comprises: at first topography is divided into some equal portions, then extracts color histogram in each equal portions, being together in series at last obtains the structural color feature.

S1023, many features vision code book of obtaining according to training in advance carry out Fusion Features to the topography of scene image, and the vision code book of generating scene image under various different characteristics expressed.

Preferably, this step further comprises: every kind of feature in the localized region, use first simple classification device (as: euclid-distance classifier, Ka Shi distance classification device, Pasteur's distance classification device) chooses candidate's vision word, then use complex classifier (such as support vector machine classifier) to calculate the probability that local features belongs to candidate's vision word; Express according to the many features vision code book behind the probability generation Local Feature Fusion of every kind of feature.Detailed step is referring to Fig. 5 and the explanation thereof of back.

S104, the overall fusion parameters that obtains according to training in advance and sorting parameter are expressed many features vision code book and are carried out the overall situation and merge and classify.

Wherein, if use based on the sorter of statistical model, can belong to by calculating sample characteristics to be identified the posterior probability of different scene classifications, select the maximum corresponding scene classification of posterior probability as classification results.Based on interfacial sorter, determine classification results by calculating sample characteristics to be identified in interfacial which side.

The method of the embodiment of the invention, by in the process that generates the expression of many features vision code book, in the many features of topography's extracted region, and the training local classifiers estimates that topography belongs to the probability of candidate's vision word.Compare with using the single features estimated probability, so local many Feature fusions can be corrected owing to the not enough mistake that causes of single features quantity of information, thereby generate more accurately vision code book expression.After these more accurately many features code book expression generate, can again pass through the Fusion Features of the overall situation, obtain final recognition result.With use single vision code book expression, or the single vision code book that generates a plurality of features carries out overall situation fusion after expressing again and compares, and adopts method of the present invention will obtain higher recognition accuracy.

Be illustrated in figure 4 as a kind of method flow diagram of training many features code book that the embodiment of the invention provides, comprise:

S402, basis are manually carried out the sample image generating training data collection that classification is demarcated;

Specifically, the sample image of collection includes but not limited to by artificial shooting or by searching on the internet many scene images of downloading.In general the training sample of every class scene approximately needs 200～300; For some visual angles and the large indoor scene of content change, need more training sample.And manually classification is carried out in these training and demarcate the generating training data collection.

S404, the sample image of concentrating from training data obtain the fractional sample image that overlaps each other under the multiple yardstick uniformly.

This step is identical with above-mentioned S1021, no longer repeats here.

S406, from each fractional sample image, extract various features;

S408, the various local features that belong to different scene classifications are carried out respectively cluster, generate a series of vision words;

Specifically, the cluster centre point is the feature representation of each vision word; Wherein, cluster includes but not limited to K-Means clustering, hierarchical cluster, fuzzy K-means cluster and simulated annealing cluster etc.

Wherein, K-Means is a kind of clustering method commonly used, by setting clusters number K, generate at random K cluster centre after, upgrade cluster centre and characteristic of correspondence is vectorial by iteration, proper vector is divided into K cluster.The use additive method is not described in detail.

S410, different set put in the vision word of different characteristic generate the corresponding vision code book of each feature;

For instance, suppose to have feature among N kind scene classification and the M, will obtain so M vision code book of N X.

S412, the regional area of sample image in the training set is carried out Fusion Features, generate the expression of sample image on different characteristic vision code book.

Many Fusion Features of S414, the study overall situation, and store overall fusion parameters and sorting parameter.

Particularly, this step can realize by following several modes:

(1) each different characteristic code book is expressed be together in series and obtain all Characteristic of Images vectors in the training set, then according to calibrated sample class training classifier, obtain simultaneously overall fusion parameters and sorting parameter.

Wherein, the concrete training method of sorter is introduced in S507.

(2) express according to the vision code book of different characteristic and obtain the nuclear matrix of different characteristic in training set, the linearity that obtains each nuclear matrix by Multiple Kernel Learning again adds and parameter and sorting parameter.

Wherein, Multiple Kernel Learning, the error rate by minimizing training sample with minimize structure risk (referring generally to maximize training sample and interfacial spacing), ask linear weighted function coefficient (fusion parameters) and the sorting parameter of each nuclear matrix.

(3) respectively each visual code eigen vector is trained independently sorter, then learn the weighting parameters of each sorter.Wherein, these weighting parameters can obtain by the error rate that minimizes training sample.

The typical case of present embodiment uses as at intelligent vehicle navigation, can according to top training method, use vehicle-mounted camera to gather different location (as: street 1, street 2, highway 1, highway 2) trains after scene image and artificial the demarcation.In concrete navigation procedure, vehicle-mounted camera is constantly collected institute through the picture in place, and these pictures just can be known through the recognition methods that the following describes and work as the place that vehicle in front is travelled.

Be illustrated in figure 5 as the method flow diagram of another kind of local many Fusion Features that the preferred embodiment of the present invention provides, comprise:

S501, from each regional area of image, extract certain single features;

S502, from the consistent vision code book of this characteristic type, just select a plurality of candidate's vision words by simple classification device (for example: Euclidean distance, Ka Shi distance, Pasteur distance etc.); Execution in step S503 when also needing to extract other features, otherwise execution in step S505;

Particularly, calculate the distance (Euclidean distance or cassette distance or Pasteur's distance) of vision word in this feature vision code book consistent with this characteristic type, find out minor increment and corresponding vision word, and calculate the distance (Euclidean distance or cassette distance or Pasteur are apart from distance) of other vision words and local features and the ratio of minor increment, therefrom select ratio and form candidate's vision set of letters less than the vision word of certain threshold value and the vision word of minor increment.Wherein, Euclidean distance is the quadratic sum sqrt of difference between two each element of vector.

S503, from image local area, extract the feature of other types;

The proper vector of S504, series connection image local area, execution in step S508 after the element span in the different characteristic proper vector generally need not be carried out normalized simultaneously, otherwise direct execution in step S508;

Wherein normalization refers to the span of each element of proper vector is transformed between 0～1.

S505, obtain in the cluster many proper vectors of the regional area that forms candidate's vision word;

Particularly, according to candidate's vision word, find out the topography zone that forms this vision word in the cluster process, then from these topographies, extract various features.The feature extracting method front is told about.

S506, many proper vectors of each regional area of connecting, the element span in the different characteristic proper vector are not then carried out normalized simultaneously;

Each candidate's vision word can obtain stack features vector.

S507, training is used for the local local classifiers that merges;

Particularly, different regional areas respectively training classifier solve the Fusion Features problem in topography zone.The proper vector in the different corresponding topography of candidate's vision word zones is as dissimilar training sample training classifier (we claim that this sorter is local classifiers).

Wherein, sorter is by the different classes of proper vector statistical model of study, or learns between the different classes of proper vector interphase and finish classification task.Can select support vector machine (but being not limited to) different classes of by study between the interphase of proper vector (or vector linearity (non-linear) mapping), this interphase is minimizing the simultaneous minimization structure risk of training error.

S508, employing local classifiers estimate that this regional area belongs to the probability of different word candidate;

Concrete, according to the proper vector that S504 obtains, estimate that with the local classifiers that S507 obtains this topography zone belongs to the probability of different candidate's vision words.

Wherein for the sorter based on proper vector, the probability that belongs to different candidate's vision words can draw by calculating its posterior probability.For the sorter based on support vector machine, can be by calculated characteristics and interfacial distance estimations probability.

S509, express according to the probability generating feature vision code book of word candidate;

Particularly, the probability that belongs to different candidate's vision words according to each topography zone generates the vision code book of certain feature and expresses.Wherein the vision code book express refer to a proper vector, each element of proper vector recorded for the probability of occurrence of vision word.

At first, according to the vision number of words N of this feature _w, generate a N _wThe proper vector of dimension sets to 0 each element of this vector.Then according to S508, to each regional area of scene image, the probability of calculated candidate vision word is if this probability greater than the corresponding proper vector element of this candidate's vision word, then upgrades the value of this proper vector element.

Different types of feature is repeated to express for the vision code book of different characteristic after S501～S509 generates Local Feature Fusion.

The scene Recognition apparatus module structural drawing that a kind of many features vision code book that being illustrated in figure 6 as the embodiment of the invention provides merges, this device comprises: training module 10, local Fusion Module 20 and overall Fusion Module 30, wherein:

Training module 10 is used for study and manually carries out the sample image that classification is demarcated, and obtains many features vision code book by sample image being carried out the part fusion, obtains overall fusion parameters and sorting parameter by sample image being carried out overall situation fusion.

Specifically, training module 10 is used for training module and is used for the sample image generating training data collection that basis is manually carried out the classification demarcation; The sample image of concentrating from training data obtains the fractional sample image that overlaps each other under the multiple yardstick uniformly; From each fractional sample image, extract various features; The various fractional sample characteristics of image that belong to different scene classifications are carried out respectively cluster, generate a series of vision words; Different set put in the vision word of different characteristic generate the corresponding vision code book of each feature, training module also is used for: the regional area to the training set sample image carries out Fusion Features, generates the expression of sample image on different characteristic vision code book; Many Fusion Features of the training overall situation, and store overall fusion parameters and sorting parameter.

Local Fusion Module 20 is used for by local classifiers the scene image regional area being carried out many Fusion Features, and the many features vision code book that obtains the scene image regional area is expressed;

Further, local Fusion Module 20 comprises:

Topography's acquiring unit 201 is used for obtaining uniformly topography overlapped under the multiple yardstick from scene image;

Feature extraction unit 202 is used for extracting various features from each topography;

The vision code book is expressed generation unit 203, is used for many features vision code book of obtaining by training in advance the topography of scene image is carried out Fusion Features, and the vision code book of generating scene image under various different characteristics expressed.

Further, vision code book expression generation unit 203 comprises:

Probability calculation subelement 2031 is used for every kind of feature of localized region, uses first the simple classification device to choose candidate's vision word, then uses complex classifier to calculate the probability that local features belongs to candidate's vision word;

The vision code book is expressed computation subunit 2032, is used for expressing according to the many features vision code book behind the probability generation Local Feature Fusion of every kind of feature.

Overall situation Fusion Module 30 is used for the overall fusion parameters that obtains according to training in advance and sorting parameter and many features vision code book is expressed carries out the overall situation and merge and classify.

Need to prove that the technical scheme of the scene Recognition method that many features vision code book of front merges can realize by the device of present embodiment, no longer repeats here.

The method and apparatus of the embodiment of the invention, by extract many features at image local area, and the sorter of training regional area estimates that topography belongs to the probability of candidate's vision word, generate a plurality of feature vision code books expression and carry out again overall amalgamation judging, with use single features estimated probability, generate that single vision code book is expressed or the single vision code book of a plurality of features is expressed and carried out global characteristics again and merge and compare, can correct the mistake that causes owing to the characteristic information quantity not sufficient, thereby generate more accurately vision code book expression, the global characteristics amalgamation judging is passed through in these more accurately many features vision code book expression again, has improved the accuracy of final scene Recognition.

More than with reference to the accompanying drawings of the preferred embodiments of the present invention, be not so limit to interest field of the present invention.Those skilled in the art do not depart from the scope and spirit of the present invention, and can have multiple flexible program to realize the present invention, obtain another embodiment such as the feature as an embodiment can be used for another embodiment.Allly using any modification of doing within the technical conceive of the present invention, be equal to and replace and improve, all should be within interest field of the present invention.

Claims

1. the scene Recognition method that merges of the code book of feature vision more than a kind is characterized in that the method comprises:

The overall fusion parameters and the sorting parameter that obtain according to training in advance carry out overall situation fusion and classification to described many features vision code book expression.

2. scene Recognition method according to claim 1 is characterized in that, describedly by local classifiers image local area is carried out many Fusion Features, and the many features vision code book that obtains image local area is expressed and comprised:

Obtain uniformly topography overlapped under the multiple yardstick from described scene image;

From each topography, extract various features;

The many features vision code book that obtains by training in advance carries out Fusion Features to the topography of described scene image, and the vision code book of generating scene image under various different characteristics expressed.

3. scene Recognition method according to claim 2, it is characterized in that, the many features vision code book that obtains by training in advance carries out Fusion Features to the topography of described scene image, and the vision code book of generating scene image under various different characteristics expressed and comprised:

To every kind of feature in the described regional area, use first the simple classification device to choose candidate's vision word, then use complex classifier to calculate the probability that local features belongs to candidate's vision word;

Express according to the many features vision code book behind the probability generation Local Feature Fusion of described every kind of feature.

4. scene Recognition method according to claim 1 is characterized in that, the overall fusion parameters that obtains according to training in advance and sorting parameter are expressed described many features vision code book and carried out overall situation fusion and classification comprises:

Calculate scene image in an interfacial side as classification results.

5. scene Recognition method according to claim 2 is characterized in that, described various features comprises: gradient orientation histogram feature, structure partial binary pattern feature, color characteristic or structural color feature.

6. the described scene Recognition method of any one is characterized in that according to claim 1-5, comprises that also training sample image obtains the step of many features vision code book of local classifiers, specifically comprises before the described method:

The sample image of concentrating from described training data obtains the fractional sample image that overlaps each other under the multiple yardstick uniformly;

From each fractional sample image, extract various features;

7. scene Recognition method according to claim 6, it is characterized in that, described vision word with different characteristic is put into different set and is generated after the corresponding vision code book of each feature, also comprises obtaining overall fusion parameters and sorting parameter step, is specially:

8. scene Recognition method according to claim 7 is characterized in that, many Fusion Features of the described training overall situation, and store overall fusion parameters and sorting parameter comprises:

After the proper vector series connection with many features vision code book, use classifier calculated fusion parameters and sorting parameter;

9. the scene Recognition device that merges of the code book of feature vision more than a kind is characterized in that this device comprises:

Overall situation Fusion Module, the overall fusion parameters and the sorting parameter that are used for obtaining according to training in advance carry out overall situation fusion and classification to described many features vision code book expression.

10. scene Recognition device according to claim 9 is characterized in that, described local Fusion Module comprises:

Topography's acquiring unit is used for obtaining uniformly topography overlapped under the multiple yardstick from described scene image;

The vision code book is expressed generation unit, is used for many features vision code book of obtaining by training in advance the topography of described scene image is carried out Fusion Features, and the vision code book of generating scene image under various different characteristics expressed.

11. scene Recognition device according to claim 10 is characterized in that, described vision code book is expressed generation unit and is comprised:

The probability calculation subelement is used for every kind of feature of described regional area, uses first the simple classification device to choose candidate's vision word, then uses complex classifier to calculate the probability that local features belongs to candidate's vision word;

The vision code book is expressed computation subunit, is used for expressing according to the many features vision code book behind the probability generation Local Feature Fusion of described every kind of feature.

12. the described scene Recognition device of any one according to claim 9-11, it is characterized in that, described device also comprises training module, described training module is used for study and manually carries out the sample image that classification is demarcated, obtain many features vision code book by sample image being carried out the part fusion, obtain overall fusion parameters and sorting parameter by sample image being carried out overall situation fusion.