CN105404886B

CN105404886B - Characteristic model generation method and characteristic model generating means

Info

Publication number: CN105404886B
Application number: CN201410471391.5A
Authority: CN
Inventors: 胡平; 李静雯; 师忠超; 鲁耀杰
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2014-09-16
Filing date: 2014-09-16
Publication date: 2019-01-18
Anticipated expiration: 2034-09-16
Also published as: JP2016062610A; CN105404886A

Abstract

Disclose a kind of characteristic model generation method and device, which comprises obtain characteristic point and its location information and description information in the target image；For each characteristic point, matching matching vision entry is searched in visual dictionary model based on description information, wherein visual dictionary model includes the first and second class vision entries, in first kind vision entry, vision entry and other vision entries have relevance in spatial relationship；For the matching vision entry with Feature Points Matching, mapping objects vision entry is determined according to its classification, and the feature weight that characteristic point is mapped on mapping objects vision entry is at least calculated based on characteristic point and the description information of matching vision entry；And the feature weight on the vision entry in view-based access control model dictionary model generates the characteristic model of the target image with spatial information.Therefore, according to the method for the present invention, the characteristic model of the target image with spatial information can be generated.

Description

Characteristic model generation method and characteristic model generating means

Technical field

The present invention relates to digital image processing field, more particularly it relates to a kind of characteristic model generation method and Characteristic model generating means.

Background technique

Visual dictionary model (or referred to as vision bag of words BoF) be current goal classification or field of target recognition most Good one of method.The model can express clarification of objective well, to make every effort to obtain higher discrimination.

The building of visual dictionary model is the feature based on characteristic point to realize, thus the model for position, illumination, Rotation and affine transformation have invariance.Meanwhile the model also has preferable robustness for partial occlusion and offset.

However, traditional visual dictionary model due to directly by target all characteristic points generate histogram feature, and The spatial information of the characteristic point in target is not accounted for, it is thus impossible to obtain good discrimination.

It is proposed to this end that a kind of improved spatial vision dictionary matching method, use space pyramid matches (SPM) and makees For the supplement for considering spatial information.Spatial pyramid matching is that one kind can increase spatial information to original visual dictionary model Straightforward procedure.The matching algorithm gets up with visual dictionary models coupling, can be in spatial pyramid corresponding for target Each piece of subregion all obtains a feature vector, rather than obtains the single features vector based on target entirety.Wherein, each spy Sign vector is the characteristic information of view-based access control model dictionary model corresponding to one piece of subregion of target.When obtaining all subregions After feature vector, the bigger feature vector of dimension can be combined them into, this feature vector implys rough space Information.Therefore, spatial pyramid matching can obtain better discrimination.However, because spatial pyramid matching algorithm pair Characteristic point in all subregion of image exists when carrying out visual dictionary matching largely to be computed repeatedly, and a large amount of processing is occupied Resource, and variation in rigidity is only only accounted for, so the algorithm is too stringent and time-consuming.In addition, being matched in spatial pyramid In method, since each characteristic point is merely able to have an impact its affiliated subregion, so being almost beyond expression in this case Correlation between each sub-regions.

Another improved spatial vision dictionary matching method has also contemplated spatial information in visual dictionary matching process. Sample is divided into different sub-blocks by it, and obtaining the spatial relationship between different sub-blocks by distance template influences.However, this method The spatial relationship considered is still rigid, and there is no the internal structures for considering target object.That is, visual dictionary In each vision entry (token-category) be it is independent, do not consider the correlation between them.

In conclusion existing visual dictionary model can not express the spatial information of target well, and in video Also many limitations be will receive in relevant application.

Summary of the invention

So-called bag of words are exactly packing or encapsulation comprising one group of data.It has been usually contained in a vision bag of words several The essential characteristic element of width figure, such as the feature of several width figures, the feature including shape, structure, color, texture etc..Due to view Feel that bag of words have a kind of or multiclass image some features, so when extracting the element in vision bag of words, so that it may to phase Nearly class image is described, while being also used as the classification of different classes of image.Vision bag of words are used in some image, Visual dictionary can also be visually known as, in including a series of vision entries, allow the various feature visions of the image Each vision entry in dictionary indicates.

The purpose of the present invention is to provide the characteristic models that one kind can generate the target image with spatial information.

For this purpose, the present invention is when establishing characteristic model, in addition to considering visual dictionary model, it is also contemplated that each on image Spatial relation between point, to be more accurately configured to the disaggregated model of classification image, thus more accurately to figure As classifying.

According to an aspect of the invention, there is provided a kind of characteristic model generation method, which comprises in target figure At least one characteristic point is obtained as in and obtains the location information and description information of each characteristic point；For each characteristic point, it is based on The description information of the characteristic point, which is searched in visual dictionary model, matches vision at least one of the Feature Points Matching Entry, wherein the visual dictionary model includes first kind vision entry and the second class vision entry, in the first kind vision In entry, a vision entry and at least one other vision entry have relevance in spatial relationship；For with the spy The matched each matching vision entry of sign point determines at least one mapping objects vision according to the classification of the matching vision entry Entry, and the spy is at least calculated based on the description information of the characteristic point and the description information of the matching vision entry The feature weight that sign point is mapped on each mapping objects vision entry；And based on each visual word in the visual dictionary model Mapped feature weight generates the characteristic model of the target image with spatial information on item.

In addition, according to another aspect of the present invention, providing a kind of characteristic model generating means, described device includes: spy Extraction unit is levied, for obtaining at least one characteristic point in the target image and obtaining location information and the description of each characteristic point Information；And visual dictionary matching unit, for being directed to each characteristic point, based on the description information of the characteristic point come in visual word It is searched in allusion quotation model and at least one of the Feature Points Matching matches vision entry, wherein the visual dictionary model includes First kind vision entry and the second class vision entry, in the first kind vision entry, a vision entry and at least one Other vision entries have relevance in spatial relationship；For each matching vision entry with the Feature Points Matching, according to The classification of the matching vision entry determines at least one mapping objects vision entry, and at least based on the characteristic point Description information and the description information of the matching vision entry are mapped to each mapping objects vision entry to calculate the characteristic point On feature weight；And tool is generated based on mapped feature weight on each vision entry in the visual dictionary model There is the characteristic model of the target image of spatial information.

It compared with prior art, can be from using characteristic model generation method according to an embodiment of the present invention and device The first step for establishing visual dictionary model starts, and is just closed using the immanent structure between target object various pieces (i.e. characteristic point) System is to establish the spatial correlation inside the target object, and based on this spatial correlation come each spy in performance objective image Feature weight in sign point and visual dictionary model between each vision entry maps, so as to generate it is with spatial information, The characteristic model of the target image of the weight of view-based access control model entry.

Other features and advantages of the present invention will be illustrated in the following description, also, partly becomes from specification It obtains it is clear that understand through the implementation of the invention.The objectives and other advantages of the invention can be by specification, right Specifically noted structure is achieved and obtained in claim and attached drawing.

Detailed description of the invention

Attached drawing is used to provide further understanding of the present invention, and constitutes part of specification, with reality of the invention It applies example to be used to explain the present invention together, not be construed as limiting the invention.In the accompanying drawings:

Fig. 1 is the overview flow chart for illustrating characteristic model generation method according to an embodiment of the present invention.

Fig. 2 is the overview flow chart for illustrating the characteristic model generation method of specific example according to embodiments of the present invention.

Fig. 3 is the conceptual data stream in the characteristic model generation method for illustrate specific example according to embodiments of the present invention Figure.

Fig. 4 is the feature extraction for illustrating specific example according to embodiments of the present invention and the specific flow chart of description step.

Fig. 5 A and Fig. 5 B are the schematic diagrams for illustrating the SIFT feature description of specific example according to embodiments of the present invention.

Fig. 6 is to illustrate the pedestrian that body part is marked of specific example according to embodiments of the present invention and showing for sampled point It is intended to.

Fig. 7 is the detailed process for illustrating the visual dictionary model generation step of specific example according to embodiments of the present invention Figure.

Fig. 8 A to Fig. 8 D is the schematic diagram for illustrating the k nearest neighbor algorithm of specific example according to embodiments of the present invention.

Fig. 9 A to Fig. 9 D is the signal for illustrating the significant classification generation sub-step of specific example according to embodiments of the present invention Figure.

Figure 10 A and Figure 10 B are to illustrate the significant category associations of specific example according to embodiments of the present invention to establish sub-step Schematic diagram.

Figure 11 is the specific flow chart for illustrating the visual dictionary matching step of specific example according to embodiments of the present invention.

Figure 12 A and Figure 12 B are the matching vision entry lookup sub-steps for illustrating specific example according to embodiments of the present invention Schematic diagram.

Figure 13 A and Figure 13 B is to illustrate the feature weight mapping sub-step of specific example according to embodiments of the present invention to show It is intended to.

Figure 14 A and Figure 14 B is to illustrate the characteristic model of specific example according to embodiments of the present invention to generate sub-step and show It is intended to.

Figure 15 is the functional configuration block diagram for illustrating characteristic model generating means according to an embodiment of the present invention.

Figure 16 is the functional structure chart for illustrating vehicle control system according to an embodiment of the present invention.

Figure 17 is to illustrate the object based on object detection is carried out on in-vehicle camera image with improved visual dictionary model Detect the internal structure of subsystem.

Specific embodiment

It will be described in detail with reference to the accompanying drawings each embodiment according to the present invention.Here it is to be noted that it in the accompanying drawings, It assigns identical appended drawing reference to component part substantially with same or like structure and function, and will omit about it Repeated description.

In order to make those skilled in the art more fully understand the present invention, will come to make the present invention in the following order further detailed It describes in detail bright.

1, thought of the invention is summarized

2, characteristic model generation method

2.1, specific example

3, characteristic model generating means

4, vehicle control system

5, object detection subsystem

1, thought of the invention is summarized

During research in the prior art the technical issues of, present inventor recognized that a target object A usually organic whole, frequently includes different structure members inside it, between each structure member often There are specific immanent structure relationships.Using this thinking, we can be improved visual dictionary model, examine it sufficiently The internal structure relationship of target object is considered, to generate the characteristic model of the target image with spatial information.

2, characteristic model generation method

Hereinafter, the overall procedure of characteristic model generation method according to an embodiment of the present invention will be described with reference to Figure 1 Example.

As shown in Figure 1, this feature model generating method may include:

In step s 110, at least one characteristic point is obtained in the target image and obtain the location information of each characteristic point And description information.

Depending on the application scenarios of characteristic model generation method, the target image may include its known classification at least One sample image or the target image are also possible at least one image to be detected of its unknown classification etc..

For example, the location information of each characteristic point can be position coordinates of this feature point in sample image, and the spy The description information of sign point can be the Feature Descriptor of this feature point, or be characterized descriptor or feature description vectors, or Directly referred to as it is characterized description.

In one embodiment, it when the target image is the sample image, in addition to above-mentioned location information and retouches It states except information, can also further obtain the nearest component information of the characteristic point, for marking this feature point described Immanent structure position in target object.For example, the nearest component information can be used for describing with the characteristic point distance recently The target object inner structure part.

For this purpose, the method can also include: to obtain in the sample image after obtaining at least one characteristic point The location information of at least one structure member of the target object marked in advance；And according to the location information of each characteristic point and The location information of each structure member determines the nearest component information of the characteristic point, the nearest component information indicate with it is described The structure member of the nearest target object of characteristic point distance.

In one embodiment, obtain each characteristic point in sample image location information, description information and recently After component information, the visual dictionary model with spatial information can be generated.

For this purpose, the method can also include: location information, description letter based on each characteristic point in the sample image It ceases with nearest component information and generates the visual dictionary model.

Usually, the generation step of visual dictionary model may include that cluster, category division and spatial correlation are established Etc. sub-steps.

In a specific example, location information, description information based on each characteristic point in the sample image and most Nearly component information is come to generate the visual dictionary model may include: according to description information come to all in the sample image Characteristic point is clustered, to generate the visual dictionary model for including multiple vision entries；Based on the spy in each vision entry The vision entry is divided into the first kind vision entry and described the by the location information and nearest component information for levying point Two class vision entries, wherein the distribution of the position and recent topology component of the characteristic point in the first kind vision entry meets Predetermined distribution；And in the first kind vision entry, the inherent measurement between view-based access control model entry is to establish a vision Entry and relevance of at least one other vision entry in spatial relationship.

Particularly, in order to generate the visual dictionary model with spatial information, when certain two views in visual dictionary model Feel that entry (for example, being simply referred to as First look entry and the second vision entry) belongs to above-mentioned first kind vision entry When, relevance between them can be established based on the inherent measurement between the two first kind vision entries.For example, in this Measurement can by corresponding recent topology component in the target object in, distance in structure realizes. Just because of this, the above-mentioned visual dictionary model with spatial information is referred to as structuring visual dictionary.

For this purpose, the inherent measurement between view-based access control model entry is to establish a vision entry and at least one other visual word Relevance of the item in spatial relationship may include: to regard based on recent topology component corresponding with First look entry and with second Immanent structure distance of the corresponding recent topology component of entry in the target object is felt, to calculate the First look entry With relevance of the second vision entry in spatial relationship.

In the step s 120, for each characteristic point, based on the description information of the characteristic point come in visual dictionary model It searches and matches vision entry at least one of the Feature Points Matching.

After obtaining visual dictionary model, no matter the target image is the sample image or mapping to be checked Picture can search the feature of each characteristic point in structuring visual dictionary, to find the most similar therewith With vision entry.

For this purpose, in one embodiment, searched in visual dictionary model based on the description information of the characteristic point with At least one matching vision entry of the Feature Points Matching may include: description information and visual word according to the characteristic point The description information of each vision entry calculates the similitude between the characteristic point and the vision entry in allusion quotation model；And it will The vision entry that its similitude is greater than or equal to predetermined threshold is determined as the matching vision entry.

In step s 130, for each matching vision entry with the Feature Points Matching, according to the matching visual word The classification of item determines at least one mapping objects vision entry, and description information at least based on the characteristic point and described The description information of vision entry is matched to calculate the feature weight that the characteristic point is mapped on each mapping objects vision entry.

As described above, there are the vision entries that two classes are different in structuring visual dictionary, that is to say, that matching visual word Item may be first kind vision entry or the second class vision entry.Therefore, each characteristic point can be determined in different ways It is mapped to which or multiple vision entries, and correspondingly determines the weight number that the characteristic point is mapped on vision entry Value.In order to determine each characteristic point mapping objects vision entry to be mapped to first, in one embodiment, according to the matching The classification of vision entry determines that at least one mapping objects vision entry may include: to judge that the matching vision entry is institute State first kind vision entry or the second class vision entry；When the matching vision entry is the first kind vision entry When, it is searched in the visual dictionary model and matches vision entry in spatial relationship at least one of relevance with described Other vision entries；And the matching vision entry itself and at least one other vision entry are determined as described reflect Penetrate target visual entry；And when the matching vision entry is the second class vision entry, only the matching is regarded Feel that entry itself is determined as the mapping objects vision entry.

That is, if finding in the step s 120 belong to the second class with Feature Points Matching matching vision entry Vision entry then only considers the matching vision entry itself.On the contrary, if find in the step s 120 with The matching vision entry of Feature Points Matching belongs to first kind vision entry, then in addition to consider the matching arrived based on Feature Mapping Except vision entry itself, it is also contemplated that there are other vision entries of structure connection with the matching vision entry.

Next, in one embodiment, when the mapping objects vision entry is the matching vision entry itself, At least mapped based on the description information of the description information of the characteristic point and the matching vision entry to calculate the characteristic point Feature weight on to each mapping objects vision entry may include: to be matched between vision entry based on the characteristic point with described Characteristic distance come calculate the characteristic point be mapped to it is described matching vision entry itself on feature weight.

For example, this characteristic distance can by calculate the characteristic point feature description vectors and the matching visual word The distance between feature description vectors of item (for example, Euclidean distance) obtain.

In another embodiment, when the mapping objects vision entry is other described vision entries, at least it is based on institute The description information of the description information and the matching vision entry of stating characteristic point is mapped to each mapping mesh to calculate the characteristic point The feature weight marked on vision entry may include: based on the characteristic point and the characteristic distance matched between vision entry The characteristic point is calculated with the matching vision entry and other the described relevances of vision entry in spatial relationship to map Feature weight onto other described vision entries.

For example, matching vision entry and other the described relevances of vision entry in spatial relationship, which can be, is generating view It is established in the above process of feel dictionary model.

In step S140, based on mapped feature weight next life on each vision entry in the visual dictionary model At the characteristic model of the target image with spatial information.

It, can will be on the sub-picture finally, no matter the target image is the sample image or image to be detected The Feature Mapping of all sampled points uses certain side to the feature weight numerical value of first kind vision entry or the second class vision entry Formula is combined or considers, to obtain the unified feature description of this target object.

In one embodiment, based on mapped feature weight on each vision entry in the visual dictionary model come It may include: to calculate for each vision entry and work as the view that generating, which has the characteristic model of the target image of spatial information, Feel that each characteristic point of entry when being the first kind vision entry, in the target image is mapped to the on the vision entry One feature weight calculates each characteristic point when the vision entry is the second class vision entry, in the target image The second feature weight being mapped on the vision entry, and to the fisrt feature weight and the second feature weight into Row summation, to generate mapped general characteristic weight on the vision entry；And to each in the visual dictionary model Mapped general characteristic weight is cascaded on vision entry, to generate the feature of the target image with spatial information Model.

It, can be from establishing visual word it can be seen that the embodiment provides a kind of characteristic model generation method The first step of allusion quotation model starts, and is just established in the target object using the immanent structure relationship between target object various pieces The spatial correlation in portion, and based on this spatial correlation come in each characteristic point in performance objective image and visual dictionary model Feature weight mapping between each vision entry, thus generate the characteristic point in target image with spatial information based on view Feel the characteristic model of the weight of entry.

Obviously, the characteristic model generation method can be used to implement different purposes.In one embodiment, the spy Sign model generating method can be applied to during sample training.At this moment, the characteristic model generation method can pass through sample Image executes training operation, to obtain the visual dictionary model and classifier of target to be identified.In another embodiment, the spy Sign model generating method also can be applied to during target detection.At this moment, the characteristic model generation method can pass through instruction The visual dictionary model and classifier got exports the classification results of image to be detected.

When specifically, during being applied to sample training, this feature model generating method can first gather training In each sample image sampled, can be further to visual word after the foundation characteristic for being extracted each characteristic point Allusion quotation model improves.When generating improved visual dictionary model, this method can establish portion by inherence measurement Divide the spatial correlation between vision entry.Next, low-level image feature is mapped to visual dictionary model, with generate most When whole feature describes, this method can still consider that the spatial correlation between partial visual entry is mapped, with life At the characteristic model of the target image with spatial information.Finally, this method can also utilize obtained characteristic model Carry out classification based training.

In addition, this feature model generating method can carry out image to be detected when during being applied to target detection Sampling, based on the spatial correlation between partial visual entry come by the characteristic point in image to be detected into visual dictionary model Vision entry mapped, to generate the characteristic model of the target image with spatial information, and be utilized instruction The classification standard got generates final object detection result.

2.1, specific example

Hereinafter, it will be generated with reference to figs. 2 to Figure 14 to describe the characteristic model of specific example according to embodiments of the present invention The overall procedure of method.

It, will be characteristic model generation method be applied to offline sample training in the specific example of the embodiment of the present invention It is illustrated for during process and online target detection.

However, it is necessary to explanation, although herein to be applied to this feature model generating method to carry out classification based training It is operated with actual classification, to obtain the more accurate classification based on spatial position of image, still, the invention is not limited thereto.It should Method is equally applicable to the other application field based on characteristic model, such as image retrieval, images match, and is not limited to The image classification and field of image recognition stated.

Fig. 2 is the overview flow chart for illustrating the characteristic model generation method of specific example according to embodiments of the present invention, and Fig. 3 is the conceptual data flow graph in the characteristic model generation method for illustrate specific example according to embodiments of the present invention.

Referring to figs. 2 and 3 as can be seen that the overview flow chart of characteristic model generation method shown in Fig. 2 with it is shown in Fig. 3 Conceptual data flow graph in characteristic model generation method is correspondingly that difference is only in that, shown in Fig. 2 is special Each step S201 to S210 of model generating method is levied, and shown in Fig. 3 is to execute each of characteristic model generation method Related related data F1 to F10 when step.

As shown in Figures 2 and 3, the characteristic model generation method of specific example according to embodiments of the present invention can be applied to two A different process, i.e., offline sample training process and online target detection process.

Specifically, this feature model generating method may include:

In step s 201, input sample image.

Specifically, the visual dictionary model and classifier of target to be identified in order to obtain, can input N frame sample image F1, wherein N is natural number.

For example, the sample image may include the positive and negative sample image for model training.Positive sample is comprising wait know The image of other target (for example, people, animal, building etc.)；And negative sample is the image not comprising target to be identified.

In addition, the sample image can be the grayscale image of its known classification (positive sample or negative sample) (or for gray scale Image) and/or depth image (or being anaglyph or disparity map).Specifically, which can be direct by video camera It is collected, and the disparity map is also possible to based on binocular range measurement principle, is collected using determining video camera.

However, the invention is not limited thereto.It is apparent that any existing method for obtaining disparity map may be incorporated for this hair It is bright.For example, the disparity map can be made of directly being shot as special parallax video camera.Alternatively, binocular can also be passed through Camera, more mesh cameras, stereoscopic camera shoot grayscale image, and corresponding disparity map are then calculated according to the grayscale image. Specifically, for example, in the case where the object (or being target) of detection is the object such as vehicles or pedestrians on road, Ke Yitong Cross vehicle-mounted binocular camera to clap obtaining left image and right image, wherein using left image (or right image) as grayscale image here, and And disparity map here is calculated based on left image and right image.

Here, in one embodiment, grayscale image and view for example can be realized by being located at the camera of vehicle local The acquisition of poor figure.Perhaps it also can use such as cable network or wireless network in another embodiment to come from positioned at remote The camera of journey obtains grayscale image and corresponding disparity map.In addition, related image capturing device (for example, camera) is not required It is installed on vehicle, but, for example, it is also possible to be mounted on roadside buildings object as needed, or it is suitable for photographic subjects pair The other positions etc. of elephant.

It should be noted that disparity map here is not limited to just be obtained by multiple cameras, but can also be by One camera is obtained based on time domain.For example, can shoot to obtain piece image as left image a moment by a camera, Then in subsequent time, shooting behind camera shift position slightly is obtained into another piece image as right image, is obtained based on such To left image and right image disparity map can also be calculated.

In addition, although hereinbefore the example using grayscale image as sample image is illustrated.However, art technology Personnel, it is understood that in the camera parameter of video camera and in the case where calculate the parameters such as calculated performance of equipment and allow, Cromogram (or by color image) can be used and replace the grayscale image.

In step S202, the attribute information for obtaining characteristic point is merged to the collection of each sample image zooming-out characteristic point.

Specifically, for training the N frame sample image (including gray scale/color image, and/or depth image) in set, Feature extraction unit F1 first samples each image, extracts the various feature F2 of the target object at each sampled point, uses In next process or step.

As illustrated in figure 4, this feature is extracted specifically includes with description step:

In sub-step S2021, to each sample image zooming-out characteristic point.

Specifically, after receiving training sample image, sample image can be sampled by various modes, To obtain the characteristic point (or referred to as sampled point) of sample image.For example, characteristic point can pass through any existing extraction algorithm It obtains, such as stochastical sampling, intensive sampling, Corner Detection, scale invariant feature convert (SIFT) feature point extraction.In the following, false If being illustrated by taking intensive sampling as an example.

In sub-step S2022, the attribute information of each characteristic point is obtained.

It specifically, can be further to each characteristic point according to its spy after obtaining multiple characteristic points by intensive sampling Point is expressed accordingly, to obtain the attribute information of characteristic point, such as shape, structure, color, texture.For example, herein Sampled point can be described using any local feature, these features can be to rotation, scaling, translation invariant spy Sign.Hereinafter it is assumed that SIFT feature is used herein, the feature as sampled point.

The gradient information that Fig. 5 A shows at each pixel has been computed, and is added according to the position in circle Weight.Fig. 5 B shows the accumulation in each vertical bar in feature histogram to gradient.Due to the principle of SIFT feature description It has been be well known to those skilled in the art that, so omitting the detailed description herein.

It is described using intensive SIFT feature, after the image to target object is sampled and is described, each sampled point To have following two attributes: 1) feature describes, and is denoted as FD；And 2) location information, it is denoted as (x, y).

In addition to this, in order to establish this using the immanent structure relationship in sample image between target object various pieces Space correlation inside target object, it is also necessary to further obtain the following property of each sampled point: 3) closest structural portion Part and its distance.

In the following, for convenience, the example using pedestrian as target object is described the closest structure member and The acquisition of its range information.It is clear that the invention is not restricted to this.Target object also may include vehicle, building, guardrail etc. Other objects.

Fig. 6, which is shown, performs the schematic diagram of the pedestrian of above-mentioned closest structure member marking operation.As shown in fig. 6, Body part is indicated by solid dot, and a sampled point is indicated by circle.For example, in Fig. 6, for pedestrian detection, mark Left shoulder component (being denoted as LS), right shoulder component (being denoted as RS) and the head piece (being denoted as H) etc. of the pedestrian are remembered.

Therefore, by calculating the distance between some characteristic point and each structure member, judge in mesh with can be convenient Marking which structure member in object is the structure member the most neighbouring with this feature point, and records the closest structural portion Part and its distance, the adeditive attribute information as this feature point.

As shown in fig. 6, the closest body part of the sampled point indicated by circle is the left shoulder component LS of the pedestrian.

In step S203, improved visual dictionary model is constructed based on sampling point feature.

It specifically, can be to the feature of extraction after extracting the various features of the target object at each sampled point It is clustered, by position and place position of the feature on target object, assembles significant classification out, and establish between them Association, obtains the visual dictionary F3 of structuring.In the visual dictionary, may include can describe a large amount of of characteristics of image Vision entry.

As illustrated in figure 7, which specifically includes:

In sub-step S2031, the feature of extraction is clustered.

Specifically, after receiving the characteristic point in training sample image, the position of characteristic point can not be considered temporarily Information, and according to the description information of characteristic point, to be clustered to these characteristic points, to create visual dictionary.For example, this is sub Step is similar to the sorting procedure in traditional visual dictionary model, therefore, can be calculated herein using any tradition cluster Method clusters sampled point.Hereinafter it is assumed that by k nearest neighbor (KNN) algorithm with herein.

In Fig. 8 A to Fig. 8 D, each square dot represents the multidimensional obtained at each sampled point (or characteristic point) Feature description.For example, the clustering method general as one, k nearest neighbor algorithm can be used in here.Three circles in Fig. 8 A Point (being located at upper left, lower-left and bottom-right location) represents initial cluster centre, can be realized by spreading at random a little. Then, by iterative process shown in Fig. 8 B and 8C, more accurate cluster centre is gradually found, to obtain final cluster knot Fruit, as in fig. 8d.Since the principle of k nearest neighbor algorithm has been well known to those skilled in the art, so omitting it herein Detailed description.

Although being described by taking k nearest neighbor algorithm as an example above, the invention is not limited thereto.For example, sorting procedure is also It may include partitioning (Partitioning Methods), stratification (Hierarchical Methods), based on density Method (density-based methods), method (grid-based methods), the method based on model based on grid (Model-Based Methods) etc..Clustering algorithm may include K-MEANS algorithm, K-MEDOIDS algorithm, CLARANS calculation Method, BIRCH algorithm, CURE algorithm, CHAMELEON algorithm, DBSCAN algorithm, OPTICS algorithm, DENCLUE algorithm, STING are calculated Method, CLIQUE algorithm, WAVE-CLUSTER algorithm etc., these are all clustering algorithms mature in the prior art, herein not one by one Citing.

In this way, clustering by the description information to these characteristic points, multiple similar description information quilts can be made It clusters into a vision entry, and cluster obtains multiple vision entries from all description informations of all characteristic points, with Form visual dictionary.

It simply illustrates, it is assumed that the description information of a characteristic point a includes for example round, large red, another characteristic point b Description information include for example round, blue, and the description information of another characteristic point c includes for example rectangular, peony.That , all description informations can be clustered through the above steps to obtain several vision entries: for example round, rectangular, red, Blue, to form visual dictionary.It should be noted that is illustrated herein is that two distinct types of description information merges one The cluster example risen can also actually be merged using a seed type or more than two kinds of description information certainly Cluster.

In sub-step S2032, according to the location information of sampled point and body part information, to calculate in a upper sub-step The distribution situation for gathering each classification in S2031, finds the classification of aggregation, as significant classification.

Fig. 9 A to Fig. 9 D shows the example that significant classification generates sub-step.Here, for the image of target pedestrian In sampled point, as shown in the circle in Fig. 9 A, its corresponding characteristic point is clustered such as Fig. 9 B institute in traditional cluster process Three partial categories shown (include: classification positioned at left part, using most dark expression；Positioned at right part, utilization The classification of darker expression；Be located below part, classification using most light expression) among below in partial category. For the characteristic point shown in the arrow for being directed toward Fig. 9 B from Fig. 9 A, this point has body part information, location information and description These three properties of information, it is assumed that it is denoted as LS (left shoulder component), (x, y) and FD respectively.And then it promotes it is found that this is clustered In each point all there are these three attribute informations.Then, according to above- mentioned information, all the points in class can be assembled for this, Calculate they position and place body part distribution situation, and judge whether its distribution forms a scheduled two dimension Distribution surface.

For example, if it is judged that the shape of the distribution meets predetermined distribution, for example similar is a Gaussian Profile, wherein greatly Majority point have the maximum of points for comparing aggregation, as shown in the distribution 1 in Fig. 9 C, then we can define such classification Do significant classification (or referred to as first kind vision entry).

Alternatively, in the traditional characteristic class in the right part classification among three parts classification as shown in Figure 9 B All sampled points, it can be seen that their property respectively include: LA, (x, y), FD；LUL,(x,y),FD；LW, (x, y), FD etc. Deng.Here, LA represents left ankle, and LUL represents left upper arm, and LW represents left finesse.(x, y) is the generalization of their location informations It indicates, is actually different position coordinates in the picture, such as (x can be denoted as respectively₁,y₁)、(x₂,y₂)、(x₃,y₃)。

So, all the points for including in for this aggregation class calculate their position and dividing for place body part After cloth situation, it can be determined that go out their position and the distribution situation of place body part does not meet Gaussian Profile, on the contrary, The shape of this curved surface is more mixed and disorderly, the maximum of points that do not concentrate especially, and as shown in the distribution 2 in Fig. 9 D, then we can be with It defines such classification and is called non-significant classification (or referred to as second class vision entry).

In sub-step S2033, based on the inherence measurement between significant classification, the association between significant classification is established.

Specifically, the inherent measurement between significant classification can be recent topology component corresponding with the first significant classification and Immanent structure distance of the recent topology component corresponding with the second significant classification in the target object.For example, by pedestrian In the case where target object, skeleton distance can be used as one of metric form.For another example, using vehicle as object In the case where body, the distance between critical component (such as, car light, wheel, AB column) can be used as this metric form.

Figure 10 A and Figure 10 B show the example that significant category associations establish sub-step.In Figure 10 A, identify The framework information of pedestrian, the different gray scales of significant classification of different body parts (for example, head, left shoulder and right shoulder) Real point identifies out.The significant classification of two of them individually identifies out in fig. 1 ob, respectively represents head (H) and left shoulder Wing (LS) the two body parts, as shown in figure middle and upper part and following curve arrow.Association between the two significant classifications, can To be established by the inherent measurement between them.This inherent measurement can be indicated by weighted value.For example, between them A kind of calculation of weight calculation is as shown in formula 1:

w_H,LS=e^{(-β*idis(H,LS))}Formula 1

Wherein, w_H,LSIt indicates that significant classification H and relevance of the significant classification LS in spatial relationship, β indicate regulatory factor, comes From in empirical value, H indicates the position where the body part of head, and LS indicates the position where left shoulder body part, idis (H, LS the inherent distance between H and LS) is indicated, for example, skeleton distance etc..For example, this inherence distance can be by being marked Different body parts between pixel distance calculate.

In addition, in one embodiment, we are established between them the immediate two significant classifications that can only adjust the distance Association；It is of course also possible to establish the association between the significant classification of all significant classifications or part, this is all by actually detected Demand determines to determine, or by the internal structural characteristic of object, naturally it is also possible to based on it is other it is some consider (for example, The processing capacity etc. of whole system).

In this way, S203 through the above steps, the visual dictionary model of structuring is just set up, in this visual dictionary In model, not only there is the entry for feature obtained based on traditional clustering method, also have based on position and body part information Obtained significant classification, and based on inherent measurement, establishes the relationship between significant classification, strengthen inside visual dictionary Relevance in spatial relationship.

That is, this feature model generating method can be measured when generating visual dictionary model by inherence, Establish the association between each significant classification.

It is each Feature Points Matching one or more vision entry, and pass through spatial vision dictionary in step S204 With algorithm, so that the description method of sample image is converted to the weight of the view-based access control model entry with spatial information by pixel data Characteristic model.

Specifically, after the visual dictionary of structuring has obtained, range information and internal structural information be can use, it will The feature of target object is mapped on the visual dictionary for establish internal correlation, obtain each examined object based on view Feel that the feature of bag of words describes F4.

It should be noted that this feature point matching process can be simultaneously applied to including positive sample and negative sample All training samples.Also, the matching of vision entry and the matching of visual dictionary are different, and the matching of vision entry indicates special Levy point description information situation similar with vision entry, and the matching of visual dictionary then be foundation have spatial information based on The characteristic model of the weight of vision entry.

Generally speaking, the purpose of visual dictionary matching process is the feature representation based on characteristic point (for example, description letter Breath) a kind of feature representation of new more useful view-based access control model bag of words with spatial information is transformed into (for example, can be Eigenmatrix or characteristic model etc.).It specifically, can be by the Feature Mapping of sampled point each in sample image to generated On structuring visual dictionary, to form unified description to target object.When matching during mapping, The internal relation of structuring visual dictionary also can take into account.

As illustrated in Figure 11, which specifically includes:

In sub-step S2041, matched vision entry is searched from visual dictionary model.

After the location information for the characteristic point for receiving sample image and description information, it can be found from visual dictionary One or more vision entries that each characteristic point in sample can be characterized, as matched vision entry.For example, continuing with above-mentioned For the example referred to, the matched vision entry of a characteristic point a can be for example round, red, although this feature point a sheet The description information of body is round, large red.

Specifically, it is possible, firstly, to calculate between each vision entry in the description information and visual dictionary of characteristic point Similarity.The similarity can use distance measure fdis (f_i,C_k) indicate comprising but it is not limited to Euclidean distance. Here, f_iIt is the characterization information of characteristic point i.C_kIt is the characterization information of k-th of vision entry in visual dictionary.Away from From measured value fdis (f_i,C_k) smaller, it can indicate that characteristic point and corresponding vision entry similarity are higher.

It is then possible to select most like vision entry for each characteristic point.In the description information that obtains current signature point and After the similarity of all vision entries, the purpose of the operation is for the matched one or more visions of spatial encoding process selection Entry.When being characterized a little only one most similar matched vision entry of selection, it is properly termed as hard decision.Soft-decision be each Characteristic point selects more than one matched vision entry.

By taking soft decision as an example, finally, available court verdict, which may include similarity and each spy Levy the corresponding most like one or more vision entries of point, as matched one or more vision entries.

As illustrated in figs. 12 a and 12b, for a sampled point in target image as shown in the circle in Figure 12 A, it is corresponding Shown in the histogram of feature vector as indicated with an arrow, and it is denoted as f_i.In addition, structuring visual dictionary is as shown in Figure 12 B, the view Feel that each vision entry in dictionary model also can use the expression of its feature vector, and is denoted as C_k.Vision is matched searching When entry, as set forth above, it is possible to find in structuring visual dictionary immediate one or more according to the similitude of feature A vision entry class, for example, its similitude between characteristic point is greater than one or more vision entry classes of preset threshold.

In sub-step S2042, the feature weight that characteristic point is mapped on matching vision entry is calculated.

After determining one or more vision entries with Feature Points Matching, these visual words can be further judged The classification of item.

In the first case, in the one or more matching vision entry class, if some vision entry is non-aobvious Classification is write, then sampling point feature f_iIt is mapped to vision entry C_kOn weighted value be exactlyHere β is Regulatory factor, from empirical value；fdis(f_i,C_k) it is sampling point feature f_iWith vision entry feature C_kThe distance between.

In one example, can according in Conventional visual dictionary model calculate distance method come calculate between feature away from From fdis (f_i,C_k).For example, a kind of calculation that the distance between feature calculates is as shown in formula 2:

Formula 2

Wherein, fdis (f_i,C_k) indicate sampling point feature f_iWith vision entry feature C_kThe distance between, f_iIt is characteristic point i Characterization information, C_kIt is the characterization information of k-th of vision entry in visual dictionary, and D is feature vector Dimension.

In a second situation, if some vision entry is significant classification, in addition to being based on sampled point Feature Mapping Except this vision entry arrived, it is also necessary in view of in space structure there are other associated classifications to be also required to carry out with it Mapping, and the associated weights between the weight and classification mapped have relationship.

As shown in figures 13 a and 13b, it is assumed that sampling point feature f_iMatched significant classification l, and significant classification l and Between significant classification j, significant classification k, pass through weight w_l,jAnd w_l,kEstablish association.Here, weight w_l,jAnd w_l,kCalculation method The significant category associations in structuring visual dictionary generation step 203, which have been established, has carried out detailed Jie in sub-step 2033 It continues, and therefore omits the detailed description.

In this case, when calculating samples point feature f_iWhen projection to visual dictionary, projection may be defined that Weighted value with to significant classification l sheet isAnd via significant classification l and significant classification j, significant classification k Between relevance, the weighted value projected on significant classification j isAnd it projects on significant classification k Weighted value be

As shown in Figure 13 B, it is assumed that one shares 7 vision entries, each vision in structuring visual dictionary generated Entry has the sampled point feature weight being mapped to above it, and these sampled points are entirely from piece image, i.e. sample graph Picture.So, by above-mentioned similar mode, this feature model generating method can depend on significant classification and non-significant classification, Each vision entry of each low-level image feature in sample image into visual dictionary is mapped, is retouched with generating final feature It states, so that executing map operation in the case where adequately taking into account the associated situation between significant classification.

As described above, can be regarded for the feature of sampled point in structuring in sub-step S2041 and S2042 above Feel and searched in dictionary, to find most similar several or multiple entries, and calculates sampled point and project to these entries On weight and numerical value.Therefore, the two sub-steps traditionally, can be referred to as to coding (coding) step.

In sub-step S2043, the base of the sample image is generated based on mapped feature weight on each vision entry In the characteristic model of vision bag of words.

It specifically, next, can be by sampled point Feature Mappings all in a width image to be detected to significant class, non-significant The weight of class, which is adopted, to be combined by some way or considers, and the unified feature description of this examined object is obtained.

As shown in Figure 13 B above, it is assumed that one shares 7 vision entries in structuring visual dictionary generated, often A vision entry has the sampled point feature weight being mapped to above it, these sampled points are entirely from piece image, i.e. sample Image, and as shown in the square block in Figure 14 A.It therefore, can by being cascaded to the feature description on each vision entry To obtain the unified feature description that diagram picture is mapped on this structuring visual dictionary, such as the feature weight in Figure 14 B Shown in histogram, significant classification vision entry therein has correlation in spatial relationship.Here, it is indicated by arrow aobvious Write interrelated and between them the weight between classification.

Specifically, for the characteristic model of the target image with spatial information, in multiple vision entries One of vision entry, for example for classification j, a kind of calculation such as formula 3 that its feature weight value size calculates It is shown:

Formula 3

Wherein, H_jIndicate that mapped feature weight on j-th of vision entry in the visual dictionary model, n1 indicate Have in spatial relationship when j-th of vision entry is significant classification (first kind) the vision entry, with j-th of vision entry One vision entry of relevant property is mapped to the feature point number of j-th of vision entry, and T is with j-th of vision entry in sky Between vision entry in relationship with relevance total number, n2 indicates that when j-th of vision entry be the non-significant classification (the Two classes) vision entry when, be mapped to the feature point number of j-th of vision entry, f_tIndicate the description information of t-th of characteristic point, C_i Indicate the description information of i-th of vision entry, fdis (f_t,C_i) indicate spy between t-th of characteristic point and i-th of vision entry Levy distance, w_i,jIndicate i-th of vision entry and relevance of j-th of vision entry in spatial relationship, f_mIndicate m The description information of a characteristic point, C_jIndicate the description information of j-th of vision entry, fdis (f_m,C_j) indicate m-th characteristic point with Characteristic distance between j-th of vision entry, and β 1 and β 2 are pre-determined factors.

As described above, in sub-step S2043 above, can by all sampling point features in a width sample image to It is mapped on each vision entry in visual dictionary model, to obtain visual dictionary feature.It therefore, can be with traditionally This sub-steps is referred to as ballot (polling) step.

In step S205, it is trained by all collected characteristic models of all training sample images to obtain Classifier.

It, can in obtaining training set after the visual dictionary characteristic model of all sample images (positive sample and negative sample) To train classifier F5 using these visual dictionary features, thus according to the view-based access control model word with spatial information of sample image The characteristic model of the weight of item, obtains positive sample image (image comprising target object) and negative sample image (does not include target The image of object) classification benchmark so that in subsequent processing, in online M frame (wherein, M is natural number) image Each frame image, carry out complete feature extraction and visual dictionary matching after, using trained classifier come to object It is detected or is identified.

Specifically, classifier is to be trained based on training data, and be used to realize wait know in practical applications The classification of other target.Training data is the character modules for the view-based access control model bag of words that training sample image concentrates all sample images Type.For example, classifier can use existing algorithm: such as vector machine, Adaboost classifier, Bayes classifier, BP nerve Network classifier, decision Tree algorithms, support vector machines (SVM) algorithm etc. will not repeat them here.These sorting algorithms are all Existing for tradition, in traditional technology, the sample graph including positive sample (classification results are positive) and negative sample (classification results are negative) The characteristic model of picture can be by obtaining above-mentioned classifier based on the classification based training of above-mentioned algorithm.

Above-mentioned steps S201 to S205 is the step of establishing characteristic model and obtain classifier by study.Following steps Description is inputted region to be identified by S206 to S210, the visual dictionary and classifier obtained through the above steps come carry out identification or The step of classification.

In step S206, image to be detected is inputted.

In step S207, the attribute information for obtaining characteristic point is merged to the collection that image to be detected extracts characteristic point.

It is each Feature Points Matching one or more vision entry, and pass through spatial vision dictionary in step S208 With algorithm, so that the description method of sample image is converted to the weight of the view-based access control model entry with spatial information by pixel data Characteristic model.

Since step S206 to S208 and step S201, S202 and S204 for having been described in above are closely similar, Difference is only in that goal image is at least one image to be detected of its unknown classification, rather than is known class again Sample image, so omitting the detailed description herein.

In addition, in order to save process resource, when obtaining the attribute information of characteristic point in step S207, the attribute information It can only include: 1) feature description, be denoted as FD；And 2) location information, it is denoted as (x, y), without determining this feature point again Closest structure member and distance between the two.

In step S209, image to be detected is detected using trained classifier.

Specifically, trained classifier (for example, SVM classifier) can be based on specific classification benchmark, according to The characteristic model with spatial information of the image to be detected to be classified, to judge whether it includes target object.

In step S210, output test result.

For example, finally obtained testing result can perform well in pedestrian detection in in-vehicle camera image, vehicle inspection Survey etc..Obviously, this pedestrian's identification, the detection and identification of vehicle detection and other targets are that driver assistance drives system Essential critical function in system and vehicle automatic navigation system, this is because avoiding knocking surrounding for driver Object is very important the safe driving of driver.

However, needn't elaborate any further, what the specific example of the embodiment of the present invention proposed utilizes a kind of improved visual dictionary Model carries out this mode of object detection, can not only use in vehicle-mounted camera, can also use other a variety of applications In.For example, object detection is also particularly significant problem for monitoring field, for example subway other than vehicle driving field It stands neighbouring effective monitoring, the automatic monitoring in parking lot or the monitoring of supermarket use etc..

It can be seen that the specific example of the embodiment of the present invention provides a kind of characteristic model generation method, it can be right Each sample in training set is sampled and is extracted after the foundation characteristic of each sampled point, is changed to visual dictionary model Into so that being measured by inherence when generating visual dictionary model, establishing the association between each significant classification；And So that low-level image feature is mapped to visual dictionary described with generating final feature when, still consider between significant class Association mapped, and it is last classification based training is carried out to obtained feature description again, to be utilized improved vision The final object detection result of dictionary model.

In addition, it should be noted that, although mentioned in the present invention is image, picture etc., but it is understood that regarding In the case where frequency, above-mentioned objective classification method can also be carried out using one or more frames of video as above-mentioned image, picture.

3, characteristic model generating means

The embodiment of the present invention can also be implemented by a kind of characteristic model generating means.It hereinafter, will be with reference to figure 15 describe the functional configuration block diagrams of characteristic model generating means according to an embodiment of the present invention.

As shown in figure 15, this feature model generating means 100 may include: feature extraction unit 101 and visual dictionary With unit 103.

This feature extraction unit 101 can be used for obtaining at least one characteristic point in the target image and obtain each feature The location information and description information of point；And

The visual dictionary matching unit 103 can be used for for each characteristic point, the description information based on the characteristic point come It is searched in visual dictionary model and matches vision entry at least one of the Feature Points Matching, wherein the visual dictionary Model includes first kind vision entry and the second class vision entry, in the first kind vision entry, vision entry with At least one other vision entry has relevance in spatial relationship；For each matching visual word with the Feature Points Matching Item determines at least one mapping objects vision entry according to the classification of the matching vision entry, and at least based on described The description information of characteristic point and the description information of the matching vision entry are mapped to each mapping objects to calculate the characteristic point Feature weight on vision entry；And based on mapped feature weight on each vision entry in the visual dictionary model To generate the characteristic model of the target image with spatial information.

In one embodiment, the target image may include at least one sample image of its known classification, or The target image is image to be detected for its unknown classification, and when the target image is the sample image, Feature extraction unit 101 can be also used at least one knot that the target object marked in advance is obtained in the sample image The location information of structure component；And the spy is determined according to the location information of the location information of each characteristic point and each structure member The nearest component information of point is levied, the nearest component information indicates the knot of the target object nearest with the characteristic point distance Structure component.

In addition, this feature model generating means 100 can also include: visual dictionary model generation unit 102.

The visual dictionary model generation unit 102 can be used for the letter of the position based on each characteristic point in the sample image Breath, description information and nearest component information generate the visual dictionary model.

In one embodiment, which can be executed by following operation based on described Location information, description information and the nearest component information of each characteristic point in sample image generates the visual dictionary model: All characteristic points in the sample image are clustered according to description information, to generate the institute for including multiple vision entries State visual dictionary model；Based on the location information of the characteristic point in each vision entry and nearest component information come by the visual word Item is divided into the first kind vision entry and the second class vision entry, wherein the spy in the first kind vision entry The distribution for levying the position and recent topology component of point meets predetermined distribution；And in the first kind vision entry, based on view Feel the inherent measurement between entry to establish a vision entry and pass of at least one other vision entry in spatial relationship Connection property.

In a specific example, which can be executed by following operation based on view Feel the inherent measurement between entry to establish a vision entry and pass of at least one other vision entry in spatial relationship Connection property: existed based on and the corresponding recent topology component of First look entry and recent topology component corresponding with the second vision entry Immanent structure distance in the target object is closed to calculate the First look entry and the second vision entry in space Relevance in system.

In one embodiment, which can be executed by following operation and be based on the feature The description information of point, which is searched in visual dictionary model, matches vision entry at least one of the Feature Points Matching: according to The description information of each vision entry calculates the characteristic point and institute in the description information and visual dictionary model of the characteristic point State the similitude between vision entry；And the vision entry that its similitude is greater than or equal to predetermined threshold is determined as described With vision entry.

In one embodiment, which can be executed by following operation according to the matching The classification of vision entry determines at least one mapping objects vision entry: judging that the matching vision entry is the first kind Vision entry or the second class vision entry；When the matching vision entry is the first kind vision entry, in institute It states to search in visual dictionary model and matches vision entry in spatial relationship at least one other view of relevance with described Feel entry；And the matching vision entry itself and at least one other vision entry are determined as the mapping objects Vision entry；And when the matching vision entry is the second class vision entry, only by the matching vision entry Itself it is determined as the mapping objects vision entry.

In one embodiment, when the mapping objects vision entry is the matching vision entry itself, the vision Dictionary matching unit 103 can execute the description information and the matching vision at least based on the characteristic point by following operation The description information of entry calculates the feature weight that the characteristic point is mapped on each mapping objects vision entry: based on the spy Sign point matches the characteristic distance between vision entry and calculates the characteristic point and be mapped to the matching vision entry sheet with described Feature weight with it.

Alternatively, when the mapping objects vision entry is other described vision entries, the visual dictionary matching unit 103 descriptions that description information and the matching vision entry at least based on the characteristic point can be executed by following operation are believed Breath is to calculate the feature weight that the characteristic point is mapped on each mapping objects vision entry: based on the characteristic point and described With between vision entry characteristic distance and the matching vision entry and other the described passes of vision entry in spatial relationship Connection property calculates the feature weight that the characteristic point is mapped on other described vision entries.

In one embodiment, which can be executed by following operation and be based on the vision Mapped feature weight generates the spy of the target image with spatial information on each vision entry in dictionary model It levies model: being directed to each vision entry, calculate when the vision entry is the first kind vision entry, in the target image Each characteristic point be mapped to the fisrt feature weight on the vision entry, calculate when the vision entry is the second class view Feel that each characteristic point when entry, in the target image is mapped to the second feature weight on the vision entry, and to institute It states fisrt feature weight and the second feature weight is summed, to generate mapped general characteristic on the vision entry Weight；And mapped general characteristic weight on each vision entry in the visual dictionary model is cascaded, with life At the characteristic model of the target image with spatial information.

Above-mentioned feature extraction unit 101, visual dictionary model generation unit 102 and visual dictionary matching unit 103 Concrete function and operation have been described above in the characteristic model generation method referring to figs. 1 to Figure 14 description and are discussed in detail, and therefore, It will omit its repeated description.

It should be noted that the component of features described above model generating means 100 can be realized with software program, such as logical CPU combination RAM and ROM for crossing in general purpose computer etc. and the software code wherein run are realized.Software program can deposit Storage is loaded on such as random access storage device RAM at runtime on the storage mediums such as flash memory, floppy disk, hard disk, CD To be executed by CPU.In addition, in addition on general purpose computer, it can also be by the cooperation between specific integrated circuit and software come real It is existing.The integrated circuit include for example, by MPU (microprocessing unit), DSP (digital signal processor), (scene can compile FPGA Journey gate array), at least one of ASIC (specific integrated circuit) etc. realizes.Such general purpose computer is dedicated integrated Circuit etc. can be for example loaded on specific position (for example, vehicle), and with installation on location for road It is communicated with the imaging device such as camera of the associated object imaging of road, so as to the two dimension for obtaining camera shooting Image and stereo-picture generate the characteristic model with spatial information as target image.In addition, this feature model generates dress Setting 100 all parts can be realized with special hardware, such as specific field programmable gate array, specific integrated circuit Deng.In addition, all parts of characteristic model generating means 100 can use the combination of software and hardware also to realize.

4, vehicle control system

It is also possible to be applied to a kind of vehicle control system, for realizing the row in in-vehicle camera image People's detection, vehicle detection etc..Hereinafter, vehicle control system according to an embodiment of the present invention will be described with reference to Figure 16 Functional structure.

Specifically, Figure 16 shows a vehicle control system 300, using proposed by the present invention based on structure member Improved visual dictionary model carries out the method for object detection to realize the automatic control of vehicle.

The vehicle control system 300 is mounted on vehicle 1000.The system 300 includes two camera 310, one figures As processing module 320 and a vehicle control module 330.Camera 310 may be mounted at close to the position of vehicle mirrors, For capturing the scene in 1000 front of vehicle.The image of the vehicle front scene captured will be as image processing module 320 Input.The image of the analysis input of image processing module 320, pre-processes them, extracts sampling point feature, establishes structure view Feel dictionary model, obtains the feature being mapped on improved visual dictionary model.In one embodiment, the image processing module 320 can be classified with further progress judges, to obtain testing result, so that telling target object is pedestrian, vehicle, still Roadblock, signal lamp etc..Vehicle control module 330 receives the signal exported by image processing module 320, is examined according to obtained object It surveys as a result, generating control signals to the driving direction and travel speed of control vehicle 1000.

For example, the image processing module 320 can use characteristic model generating means 100 shown in figure 15 cooperate it is traditional The configuration of classifier is realized.

5, object detection subsystem

As shown in figure 17, which is the subset in the illustrated vehicle control system of Figure 16, i.e., it is only It only include camera 310 and image processing module 320.

Specifically, camera 310 includes an imaging sensor 201 and a camera digital signal processor (DSP) 202.Imaging sensor 201 converts optical signals to electronic signal, and the image in 1000 front of current vehicle of capture is converted to Analog picture signal, then result is passed to camera DSP 202.If desired, camera 310 can further include mirror Head, filter etc..For example, in the present system, may include multiple cameras 310, being caught simultaneously after these cameras registration Obtain multiple image.

Analog picture signal is converted to data image signal by camera DSP 202, and is sent to image processing module 320。

In image processing module 320, image input interface 203 obtains image by predetermined time interval.Depth map at Picture module 204 utilizes vision or other principles, and a pair of of digital picture of input is converted to depth map.Then depth image is write Enter memory 206, then is analyzed and handled by program 207.Image procossing herein includes a variety of operations, such as sampling point feature Calculating, visual dictionary feature calculation etc..The program 207 being loaded into memory (for example, ROM or RAM) 206 executes a series of It operates to carry out the detection of object.In the process, CPU 205 is responsible for control operation and arithmetic operation, such as is obtained by interface Data execute image procossing etc..

For example, the program 207 can be used for implementing above-mentioned characteristic model generation method according to an embodiment of the present invention.

Each embodiment of the invention has been described in detail above.However, it should be appreciated by those skilled in the art that not taking off In the case where from the principle and spirit of the invention, these embodiments can be carry out various modifications, combination or sub-portfolio, and in this way Modification should fall within the scope of the present invention.

Claims

1. a kind of characteristic model generation method, which is characterized in that the described method includes:

At least one characteristic point is obtained in the target image and obtains the location information and description information of each characteristic point；

For each characteristic point,

Based on the description information of the characteristic point come at least one of lookup and the Feature Points Matching in visual dictionary model Match vision entry, wherein the visual dictionary model includes first kind vision entry and the second class vision entry, described the In a kind of vision entry, a vision entry and at least one other vision entry have relevance in spatial relationship；

For each matching vision entry with the Feature Points Matching, determined at least according to the classification of the matching vision entry One mapping objects vision entry, and the description of the description information and the matching vision entry at least based on the characteristic point Information calculates the feature weight that the characteristic point is mapped on each mapping objects vision entry；And

It is generated based on mapped feature weight on each vision entry in the visual dictionary model with spatial information The characteristic model of the target image；

Wherein, include: to determine at least one mapping objects vision entry according to the classification of the matching vision entry

Judge that the matching vision entry is the first kind vision entry or the second class vision entry；

When the matching vision entry is the first kind vision entry, searched and described in the visual dictionary model With vision entry at least one other vision entry of relevance in spatial relationship；And by the matching vision entry Itself and at least one other vision entry are determined as the mapping objects vision entry；And

When the matching vision entry is the second class vision entry, only the matching vision entry itself is determined as The mapping objects vision entry.

2. the method according to claim 1, which is characterized in that the target image includes at least one sample of its known classification Image or the target image are at least one image to be detected of its unknown classification, and

When the target image is the sample image, the method also includes:

The location information of at least one structure member of the target object marked in advance is obtained in the sample image；And

The nearest component letter of the characteristic point is determined according to the location information of the location information of each characteristic point and each structure member Breath, the nearest component information indicate the structure member of the target object nearest with the characteristic point distance.

3. method according to claim 2, which is characterized in that the method also includes:

The view is generated based on the location information of each characteristic point in the sample image, description information and nearest component information Feel dictionary model.

4. according to the method in claim 3, which is characterized in that location information based on each characteristic point in the sample image, Description information and nearest component information include: to generate the visual dictionary model

All characteristic points in the sample image are clustered according to description information, include multiple vision entries to generate The visual dictionary model；

The vision entry is divided into institute based on the location information of the characteristic point in each vision entry and nearest component information State first kind vision entry and the second class vision entry, wherein the position of the characteristic point in the first kind vision entry Distribution with recent topology component meets predetermined distribution；And

In the first kind vision entry, inherent measurement between view-based access control model entry come establish a vision entry at least One other relevances of vision entry in spatial relationship.

5. method according to claim 4, which is characterized in that the inherent measurement between view-based access control model entry is to establish a vision Entry includes: with relevance of at least one other vision entry in spatial relationship

Existed based on and the corresponding recent topology component of First look entry and recent topology component corresponding with the second vision entry Immanent structure distance in the target object is closed to calculate the First look entry and the second vision entry in space Relevance in system.

6. the method according to claim 1, which is characterized in that based on the description information of the characteristic point come in visual dictionary model Middle lookup matches vision entry at least one of the Feature Points Matching and includes:

The spy is calculated according to the description information of vision entry each in the description information of the characteristic point and visual dictionary model Similitude between sign point and the vision entry；And

The vision entry that its similitude is greater than or equal to predetermined threshold is determined as the matching vision entry.

7. method according to claim 6, which is characterized in that

Description letter when the mapping objects vision entry is the matching vision entry itself, at least based on the characteristic point Breath calculates the spy that the characteristic point is mapped on each mapping objects vision entry with the description information for matching vision entry Sign weight includes: match the characteristic distance between vision entry based on the characteristic point with described and calculate characteristic point mapping Feature weight onto the matching vision entry itself；And

When the mapping objects vision entry is other described vision entries, description information at least based on the characteristic point and The description information of the matching vision entry is weighed to calculate the feature that the characteristic point is mapped on each mapping objects vision entry Include: again based on the characteristic point and the characteristic distance matched between vision entry and the matching vision entry with it is described Other relevances of vision entry in spatial relationship calculate the spy that the characteristic point is mapped on other described vision entries Levy weight.

8. the method according to claim 1, which is characterized in that based on being reflected on each vision entry in the visual dictionary model The feature weight penetrated includes: come the characteristic model for generating the target image with spatial information

For each vision entry, calculate when the vision entry is the first kind vision entry, in the target image Each characteristic point is mapped to the fisrt feature weight on the vision entry, calculates when the vision entry is the second class vision Each characteristic point when entry, in the target image is mapped to the second feature weight on the vision entry, and to described Fisrt feature weight and the second feature weight are summed, and are weighed with generating mapped general characteristic on the vision entry Weight；And

Mapped general characteristic weight on each vision entry in the visual dictionary model is cascaded, is had to generate The characteristic model of the target image of spatial information.

9. a kind of characteristic model generating means, which is characterized in that described device includes:

Feature extraction unit, for obtaining at least one characteristic point in the target image and obtaining the location information of each characteristic point And description information；And

Visual dictionary matching unit, for being directed to each characteristic point,