CN105404886B - Characteristic model generation method and characteristic model generating means - Google Patents
Characteristic model generation method and characteristic model generating means Download PDFInfo
- Publication number
- CN105404886B CN105404886B CN201410471391.5A CN201410471391A CN105404886B CN 105404886 B CN105404886 B CN 105404886B CN 201410471391 A CN201410471391 A CN 201410471391A CN 105404886 B CN105404886 B CN 105404886B
- Authority
- CN
- China
- Prior art keywords
- vision
- entry
- vision entry
- matching
- characteristic point
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Landscapes
- Image Analysis (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Disclose a kind of characteristic model generation method and device, which comprises obtain characteristic point and its location information and description information in the target image;For each characteristic point, matching matching vision entry is searched in visual dictionary model based on description information, wherein visual dictionary model includes the first and second class vision entries, in first kind vision entry, vision entry and other vision entries have relevance in spatial relationship;For the matching vision entry with Feature Points Matching, mapping objects vision entry is determined according to its classification, and the feature weight that characteristic point is mapped on mapping objects vision entry is at least calculated based on characteristic point and the description information of matching vision entry;And the feature weight on the vision entry in view-based access control model dictionary model generates the characteristic model of the target image with spatial information.Therefore, according to the method for the present invention, the characteristic model of the target image with spatial information can be generated.
Description
Technical field
The present invention relates to digital image processing field, more particularly it relates to a kind of characteristic model generation method and
Characteristic model generating means.
Background technique
Visual dictionary model (or referred to as vision bag of words BoF) be current goal classification or field of target recognition most
Good one of method.The model can express clarification of objective well, to make every effort to obtain higher discrimination.
The building of visual dictionary model is the feature based on characteristic point to realize, thus the model for position, illumination,
Rotation and affine transformation have invariance.Meanwhile the model also has preferable robustness for partial occlusion and offset.
However, traditional visual dictionary model due to directly by target all characteristic points generate histogram feature, and
The spatial information of the characteristic point in target is not accounted for, it is thus impossible to obtain good discrimination.
It is proposed to this end that a kind of improved spatial vision dictionary matching method, use space pyramid matches (SPM) and makees
For the supplement for considering spatial information.Spatial pyramid matching is that one kind can increase spatial information to original visual dictionary model
Straightforward procedure.The matching algorithm gets up with visual dictionary models coupling, can be in spatial pyramid corresponding for target
Each piece of subregion all obtains a feature vector, rather than obtains the single features vector based on target entirety.Wherein, each spy
Sign vector is the characteristic information of view-based access control model dictionary model corresponding to one piece of subregion of target.When obtaining all subregions
After feature vector, the bigger feature vector of dimension can be combined them into, this feature vector implys rough space
Information.Therefore, spatial pyramid matching can obtain better discrimination.However, because spatial pyramid matching algorithm pair
Characteristic point in all subregion of image exists when carrying out visual dictionary matching largely to be computed repeatedly, and a large amount of processing is occupied
Resource, and variation in rigidity is only only accounted for, so the algorithm is too stringent and time-consuming.In addition, being matched in spatial pyramid
In method, since each characteristic point is merely able to have an impact its affiliated subregion, so being almost beyond expression in this case
Correlation between each sub-regions.
Another improved spatial vision dictionary matching method has also contemplated spatial information in visual dictionary matching process.
Sample is divided into different sub-blocks by it, and obtaining the spatial relationship between different sub-blocks by distance template influences.However, this method
The spatial relationship considered is still rigid, and there is no the internal structures for considering target object.That is, visual dictionary
In each vision entry (token-category) be it is independent, do not consider the correlation between them.
In conclusion existing visual dictionary model can not express the spatial information of target well, and in video
Also many limitations be will receive in relevant application.
Summary of the invention
So-called bag of words are exactly packing or encapsulation comprising one group of data.It has been usually contained in a vision bag of words several
The essential characteristic element of width figure, such as the feature of several width figures, the feature including shape, structure, color, texture etc..Due to view
Feel that bag of words have a kind of or multiclass image some features, so when extracting the element in vision bag of words, so that it may to phase
Nearly class image is described, while being also used as the classification of different classes of image.Vision bag of words are used in some image,
Visual dictionary can also be visually known as, in including a series of vision entries, allow the various feature visions of the image
Each vision entry in dictionary indicates.
The purpose of the present invention is to provide the characteristic models that one kind can generate the target image with spatial information.
For this purpose, the present invention is when establishing characteristic model, in addition to considering visual dictionary model, it is also contemplated that each on image
Spatial relation between point, to be more accurately configured to the disaggregated model of classification image, thus more accurately to figure
As classifying.
According to an aspect of the invention, there is provided a kind of characteristic model generation method, which comprises in target figure
At least one characteristic point is obtained as in and obtains the location information and description information of each characteristic point;For each characteristic point, it is based on
The description information of the characteristic point, which is searched in visual dictionary model, matches vision at least one of the Feature Points Matching
Entry, wherein the visual dictionary model includes first kind vision entry and the second class vision entry, in the first kind vision
In entry, a vision entry and at least one other vision entry have relevance in spatial relationship;For with the spy
The matched each matching vision entry of sign point determines at least one mapping objects vision according to the classification of the matching vision entry
Entry, and the spy is at least calculated based on the description information of the characteristic point and the description information of the matching vision entry
The feature weight that sign point is mapped on each mapping objects vision entry;And based on each visual word in the visual dictionary model
Mapped feature weight generates the characteristic model of the target image with spatial information on item.
In addition, according to another aspect of the present invention, providing a kind of characteristic model generating means, described device includes: spy
Extraction unit is levied, for obtaining at least one characteristic point in the target image and obtaining location information and the description of each characteristic point
Information;And visual dictionary matching unit, for being directed to each characteristic point, based on the description information of the characteristic point come in visual word
It is searched in allusion quotation model and at least one of the Feature Points Matching matches vision entry, wherein the visual dictionary model includes
First kind vision entry and the second class vision entry, in the first kind vision entry, a vision entry and at least one
Other vision entries have relevance in spatial relationship;For each matching vision entry with the Feature Points Matching, according to
The classification of the matching vision entry determines at least one mapping objects vision entry, and at least based on the characteristic point
Description information and the description information of the matching vision entry are mapped to each mapping objects vision entry to calculate the characteristic point
On feature weight;And tool is generated based on mapped feature weight on each vision entry in the visual dictionary model
There is the characteristic model of the target image of spatial information.
It compared with prior art, can be from using characteristic model generation method according to an embodiment of the present invention and device
The first step for establishing visual dictionary model starts, and is just closed using the immanent structure between target object various pieces (i.e. characteristic point)
System is to establish the spatial correlation inside the target object, and based on this spatial correlation come each spy in performance objective image
Feature weight in sign point and visual dictionary model between each vision entry maps, so as to generate it is with spatial information,
The characteristic model of the target image of the weight of view-based access control model entry.
Other features and advantages of the present invention will be illustrated in the following description, also, partly becomes from specification
It obtains it is clear that understand through the implementation of the invention.The objectives and other advantages of the invention can be by specification, right
Specifically noted structure is achieved and obtained in claim and attached drawing.
Detailed description of the invention
Attached drawing is used to provide further understanding of the present invention, and constitutes part of specification, with reality of the invention
It applies example to be used to explain the present invention together, not be construed as limiting the invention.In the accompanying drawings:
Fig. 1 is the overview flow chart for illustrating characteristic model generation method according to an embodiment of the present invention.
Fig. 2 is the overview flow chart for illustrating the characteristic model generation method of specific example according to embodiments of the present invention.
Fig. 3 is the conceptual data stream in the characteristic model generation method for illustrate specific example according to embodiments of the present invention
Figure.
Fig. 4 is the feature extraction for illustrating specific example according to embodiments of the present invention and the specific flow chart of description step.
Fig. 5 A and Fig. 5 B are the schematic diagrams for illustrating the SIFT feature description of specific example according to embodiments of the present invention.
Fig. 6 is to illustrate the pedestrian that body part is marked of specific example according to embodiments of the present invention and showing for sampled point
It is intended to.
Fig. 7 is the detailed process for illustrating the visual dictionary model generation step of specific example according to embodiments of the present invention
Figure.
Fig. 8 A to Fig. 8 D is the schematic diagram for illustrating the k nearest neighbor algorithm of specific example according to embodiments of the present invention.
Fig. 9 A to Fig. 9 D is the signal for illustrating the significant classification generation sub-step of specific example according to embodiments of the present invention
Figure.
Figure 10 A and Figure 10 B are to illustrate the significant category associations of specific example according to embodiments of the present invention to establish sub-step
Schematic diagram.
Figure 11 is the specific flow chart for illustrating the visual dictionary matching step of specific example according to embodiments of the present invention.
Figure 12 A and Figure 12 B are the matching vision entry lookup sub-steps for illustrating specific example according to embodiments of the present invention
Schematic diagram.
Figure 13 A and Figure 13 B is to illustrate the feature weight mapping sub-step of specific example according to embodiments of the present invention to show
It is intended to.
Figure 14 A and Figure 14 B is to illustrate the characteristic model of specific example according to embodiments of the present invention to generate sub-step and show
It is intended to.
Figure 15 is the functional configuration block diagram for illustrating characteristic model generating means according to an embodiment of the present invention.
Figure 16 is the functional structure chart for illustrating vehicle control system according to an embodiment of the present invention.
Figure 17 is to illustrate the object based on object detection is carried out on in-vehicle camera image with improved visual dictionary model
Detect the internal structure of subsystem.
Specific embodiment
It will be described in detail with reference to the accompanying drawings each embodiment according to the present invention.Here it is to be noted that it in the accompanying drawings,
It assigns identical appended drawing reference to component part substantially with same or like structure and function, and will omit about it
Repeated description.
In order to make those skilled in the art more fully understand the present invention, will come to make the present invention in the following order further detailed
It describes in detail bright.
1, thought of the invention is summarized
2, characteristic model generation method
2.1, specific example
3, characteristic model generating means
4, vehicle control system
5, object detection subsystem
1, thought of the invention is summarized
During research in the prior art the technical issues of, present inventor recognized that a target object
A usually organic whole, frequently includes different structure members inside it, between each structure member often
There are specific immanent structure relationships.Using this thinking, we can be improved visual dictionary model, examine it sufficiently
The internal structure relationship of target object is considered, to generate the characteristic model of the target image with spatial information.
2, characteristic model generation method
Hereinafter, the overall procedure of characteristic model generation method according to an embodiment of the present invention will be described with reference to Figure 1
Example.
Fig. 1 is the overview flow chart for illustrating characteristic model generation method according to an embodiment of the present invention.
As shown in Figure 1, this feature model generating method may include:
In step s 110, at least one characteristic point is obtained in the target image and obtain the location information of each characteristic point
And description information.
Depending on the application scenarios of characteristic model generation method, the target image may include its known classification at least
One sample image or the target image are also possible at least one image to be detected of its unknown classification etc..
For example, the location information of each characteristic point can be position coordinates of this feature point in sample image, and the spy
The description information of sign point can be the Feature Descriptor of this feature point, or be characterized descriptor or feature description vectors, or
Directly referred to as it is characterized description.
In one embodiment, it when the target image is the sample image, in addition to above-mentioned location information and retouches
It states except information, can also further obtain the nearest component information of the characteristic point, for marking this feature point described
Immanent structure position in target object.For example, the nearest component information can be used for describing with the characteristic point distance recently
The target object inner structure part.
For this purpose, the method can also include: to obtain in the sample image after obtaining at least one characteristic point
The location information of at least one structure member of the target object marked in advance;And according to the location information of each characteristic point and
The location information of each structure member determines the nearest component information of the characteristic point, the nearest component information indicate with it is described
The structure member of the nearest target object of characteristic point distance.
In one embodiment, obtain each characteristic point in sample image location information, description information and recently
After component information, the visual dictionary model with spatial information can be generated.
For this purpose, the method can also include: location information, description letter based on each characteristic point in the sample image
It ceases with nearest component information and generates the visual dictionary model.
Usually, the generation step of visual dictionary model may include that cluster, category division and spatial correlation are established
Etc. sub-steps.
In a specific example, location information, description information based on each characteristic point in the sample image and most
Nearly component information is come to generate the visual dictionary model may include: according to description information come to all in the sample image
Characteristic point is clustered, to generate the visual dictionary model for including multiple vision entries;Based on the spy in each vision entry
The vision entry is divided into the first kind vision entry and described the by the location information and nearest component information for levying point
Two class vision entries, wherein the distribution of the position and recent topology component of the characteristic point in the first kind vision entry meets
Predetermined distribution;And in the first kind vision entry, the inherent measurement between view-based access control model entry is to establish a vision
Entry and relevance of at least one other vision entry in spatial relationship.
Particularly, in order to generate the visual dictionary model with spatial information, when certain two views in visual dictionary model
Feel that entry (for example, being simply referred to as First look entry and the second vision entry) belongs to above-mentioned first kind vision entry
When, relevance between them can be established based on the inherent measurement between the two first kind vision entries.For example, in this
Measurement can by corresponding recent topology component in the target object in, distance in structure realizes.
Just because of this, the above-mentioned visual dictionary model with spatial information is referred to as structuring visual dictionary.
For this purpose, the inherent measurement between view-based access control model entry is to establish a vision entry and at least one other visual word
Relevance of the item in spatial relationship may include: to regard based on recent topology component corresponding with First look entry and with second
Immanent structure distance of the corresponding recent topology component of entry in the target object is felt, to calculate the First look entry
With relevance of the second vision entry in spatial relationship.
In the step s 120, for each characteristic point, based on the description information of the characteristic point come in visual dictionary model
It searches and matches vision entry at least one of the Feature Points Matching.
After obtaining visual dictionary model, no matter the target image is the sample image or mapping to be checked
Picture can search the feature of each characteristic point in structuring visual dictionary, to find the most similar therewith
With vision entry.
For this purpose, in one embodiment, searched in visual dictionary model based on the description information of the characteristic point with
At least one matching vision entry of the Feature Points Matching may include: description information and visual word according to the characteristic point
The description information of each vision entry calculates the similitude between the characteristic point and the vision entry in allusion quotation model;And it will
The vision entry that its similitude is greater than or equal to predetermined threshold is determined as the matching vision entry.
In step s 130, for each matching vision entry with the Feature Points Matching, according to the matching visual word
The classification of item determines at least one mapping objects vision entry, and description information at least based on the characteristic point and described
The description information of vision entry is matched to calculate the feature weight that the characteristic point is mapped on each mapping objects vision entry.
As described above, there are the vision entries that two classes are different in structuring visual dictionary, that is to say, that matching visual word
Item may be first kind vision entry or the second class vision entry.Therefore, each characteristic point can be determined in different ways
It is mapped to which or multiple vision entries, and correspondingly determines the weight number that the characteristic point is mapped on vision entry
Value.In order to determine each characteristic point mapping objects vision entry to be mapped to first, in one embodiment, according to the matching
The classification of vision entry determines that at least one mapping objects vision entry may include: to judge that the matching vision entry is institute
State first kind vision entry or the second class vision entry;When the matching vision entry is the first kind vision entry
When, it is searched in the visual dictionary model and matches vision entry in spatial relationship at least one of relevance with described
Other vision entries;And the matching vision entry itself and at least one other vision entry are determined as described reflect
Penetrate target visual entry;And when the matching vision entry is the second class vision entry, only the matching is regarded
Feel that entry itself is determined as the mapping objects vision entry.
That is, if finding in the step s 120 belong to the second class with Feature Points Matching matching vision entry
Vision entry then only considers the matching vision entry itself.On the contrary, if find in the step s 120 with
The matching vision entry of Feature Points Matching belongs to first kind vision entry, then in addition to consider the matching arrived based on Feature Mapping
Except vision entry itself, it is also contemplated that there are other vision entries of structure connection with the matching vision entry.
Next, in one embodiment, when the mapping objects vision entry is the matching vision entry itself,
At least mapped based on the description information of the description information of the characteristic point and the matching vision entry to calculate the characteristic point
Feature weight on to each mapping objects vision entry may include: to be matched between vision entry based on the characteristic point with described
Characteristic distance come calculate the characteristic point be mapped to it is described matching vision entry itself on feature weight.
For example, this characteristic distance can by calculate the characteristic point feature description vectors and the matching visual word
The distance between feature description vectors of item (for example, Euclidean distance) obtain.
In another embodiment, when the mapping objects vision entry is other described vision entries, at least it is based on institute
The description information of the description information and the matching vision entry of stating characteristic point is mapped to each mapping mesh to calculate the characteristic point
The feature weight marked on vision entry may include: based on the characteristic point and the characteristic distance matched between vision entry
The characteristic point is calculated with the matching vision entry and other the described relevances of vision entry in spatial relationship to map
Feature weight onto other described vision entries.
For example, matching vision entry and other the described relevances of vision entry in spatial relationship, which can be, is generating view
It is established in the above process of feel dictionary model.
In step S140, based on mapped feature weight next life on each vision entry in the visual dictionary model
At the characteristic model of the target image with spatial information.
It, can will be on the sub-picture finally, no matter the target image is the sample image or image to be detected
The Feature Mapping of all sampled points uses certain side to the feature weight numerical value of first kind vision entry or the second class vision entry
Formula is combined or considers, to obtain the unified feature description of this target object.
In one embodiment, based on mapped feature weight on each vision entry in the visual dictionary model come
It may include: to calculate for each vision entry and work as the view that generating, which has the characteristic model of the target image of spatial information,
Feel that each characteristic point of entry when being the first kind vision entry, in the target image is mapped to the on the vision entry
One feature weight calculates each characteristic point when the vision entry is the second class vision entry, in the target image
The second feature weight being mapped on the vision entry, and to the fisrt feature weight and the second feature weight into
Row summation, to generate mapped general characteristic weight on the vision entry;And to each in the visual dictionary model
Mapped general characteristic weight is cascaded on vision entry, to generate the feature of the target image with spatial information
Model.
It, can be from establishing visual word it can be seen that the embodiment provides a kind of characteristic model generation method
The first step of allusion quotation model starts, and is just established in the target object using the immanent structure relationship between target object various pieces
The spatial correlation in portion, and based on this spatial correlation come in each characteristic point in performance objective image and visual dictionary model
Feature weight mapping between each vision entry, thus generate the characteristic point in target image with spatial information based on view
Feel the characteristic model of the weight of entry.
Obviously, the characteristic model generation method can be used to implement different purposes.In one embodiment, the spy
Sign model generating method can be applied to during sample training.At this moment, the characteristic model generation method can pass through sample
Image executes training operation, to obtain the visual dictionary model and classifier of target to be identified.In another embodiment, the spy
Sign model generating method also can be applied to during target detection.At this moment, the characteristic model generation method can pass through instruction
The visual dictionary model and classifier got exports the classification results of image to be detected.
When specifically, during being applied to sample training, this feature model generating method can first gather training
In each sample image sampled, can be further to visual word after the foundation characteristic for being extracted each characteristic point
Allusion quotation model improves.When generating improved visual dictionary model, this method can establish portion by inherence measurement
Divide the spatial correlation between vision entry.Next, low-level image feature is mapped to visual dictionary model, with generate most
When whole feature describes, this method can still consider that the spatial correlation between partial visual entry is mapped, with life
At the characteristic model of the target image with spatial information.Finally, this method can also utilize obtained characteristic model
Carry out classification based training.
In addition, this feature model generating method can carry out image to be detected when during being applied to target detection
Sampling, based on the spatial correlation between partial visual entry come by the characteristic point in image to be detected into visual dictionary model
Vision entry mapped, to generate the characteristic model of the target image with spatial information, and be utilized instruction
The classification standard got generates final object detection result.
2.1, specific example
Hereinafter, it will be generated with reference to figs. 2 to Figure 14 to describe the characteristic model of specific example according to embodiments of the present invention
The overall procedure of method.
It, will be characteristic model generation method be applied to offline sample training in the specific example of the embodiment of the present invention
It is illustrated for during process and online target detection.
However, it is necessary to explanation, although herein to be applied to this feature model generating method to carry out classification based training
It is operated with actual classification, to obtain the more accurate classification based on spatial position of image, still, the invention is not limited thereto.It should
Method is equally applicable to the other application field based on characteristic model, such as image retrieval, images match, and is not limited to
The image classification and field of image recognition stated.
Fig. 2 is the overview flow chart for illustrating the characteristic model generation method of specific example according to embodiments of the present invention, and
Fig. 3 is the conceptual data flow graph in the characteristic model generation method for illustrate specific example according to embodiments of the present invention.
Referring to figs. 2 and 3 as can be seen that the overview flow chart of characteristic model generation method shown in Fig. 2 with it is shown in Fig. 3
Conceptual data flow graph in characteristic model generation method is correspondingly that difference is only in that, shown in Fig. 2 is special
Each step S201 to S210 of model generating method is levied, and shown in Fig. 3 is to execute each of characteristic model generation method
Related related data F1 to F10 when step.
As shown in Figures 2 and 3, the characteristic model generation method of specific example according to embodiments of the present invention can be applied to two
A different process, i.e., offline sample training process and online target detection process.
Specifically, this feature model generating method may include:
In step s 201, input sample image.
Specifically, the visual dictionary model and classifier of target to be identified in order to obtain, can input N frame sample image
F1, wherein N is natural number.
For example, the sample image may include the positive and negative sample image for model training.Positive sample is comprising wait know
The image of other target (for example, people, animal, building etc.);And negative sample is the image not comprising target to be identified.
In addition, the sample image can be the grayscale image of its known classification (positive sample or negative sample) (or for gray scale
Image) and/or depth image (or being anaglyph or disparity map).Specifically, which can be direct by video camera
It is collected, and the disparity map is also possible to based on binocular range measurement principle, is collected using determining video camera.
However, the invention is not limited thereto.It is apparent that any existing method for obtaining disparity map may be incorporated for this hair
It is bright.For example, the disparity map can be made of directly being shot as special parallax video camera.Alternatively, binocular can also be passed through
Camera, more mesh cameras, stereoscopic camera shoot grayscale image, and corresponding disparity map are then calculated according to the grayscale image.
Specifically, for example, in the case where the object (or being target) of detection is the object such as vehicles or pedestrians on road, Ke Yitong
Cross vehicle-mounted binocular camera to clap obtaining left image and right image, wherein using left image (or right image) as grayscale image here, and
And disparity map here is calculated based on left image and right image.
Here, in one embodiment, grayscale image and view for example can be realized by being located at the camera of vehicle local
The acquisition of poor figure.Perhaps it also can use such as cable network or wireless network in another embodiment to come from positioned at remote
The camera of journey obtains grayscale image and corresponding disparity map.In addition, related image capturing device (for example, camera) is not required
It is installed on vehicle, but, for example, it is also possible to be mounted on roadside buildings object as needed, or it is suitable for photographic subjects pair
The other positions etc. of elephant.
It should be noted that disparity map here is not limited to just be obtained by multiple cameras, but can also be by
One camera is obtained based on time domain.For example, can shoot to obtain piece image as left image a moment by a camera,
Then in subsequent time, shooting behind camera shift position slightly is obtained into another piece image as right image, is obtained based on such
To left image and right image disparity map can also be calculated.
In addition, although hereinbefore the example using grayscale image as sample image is illustrated.However, art technology
Personnel, it is understood that in the camera parameter of video camera and in the case where calculate the parameters such as calculated performance of equipment and allow,
Cromogram (or by color image) can be used and replace the grayscale image.
In step S202, the attribute information for obtaining characteristic point is merged to the collection of each sample image zooming-out characteristic point.
Specifically, for training the N frame sample image (including gray scale/color image, and/or depth image) in set,
Feature extraction unit F1 first samples each image, extracts the various feature F2 of the target object at each sampled point, uses
In next process or step.
Fig. 4 is the feature extraction for illustrating specific example according to embodiments of the present invention and the specific flow chart of description step.
As illustrated in figure 4, this feature is extracted specifically includes with description step:
In sub-step S2021, to each sample image zooming-out characteristic point.
Specifically, after receiving training sample image, sample image can be sampled by various modes,
To obtain the characteristic point (or referred to as sampled point) of sample image.For example, characteristic point can pass through any existing extraction algorithm
It obtains, such as stochastical sampling, intensive sampling, Corner Detection, scale invariant feature convert (SIFT) feature point extraction.In the following, false
If being illustrated by taking intensive sampling as an example.
In sub-step S2022, the attribute information of each characteristic point is obtained.
It specifically, can be further to each characteristic point according to its spy after obtaining multiple characteristic points by intensive sampling
Point is expressed accordingly, to obtain the attribute information of characteristic point, such as shape, structure, color, texture.For example, herein
Sampled point can be described using any local feature, these features can be to rotation, scaling, translation invariant spy
Sign.Hereinafter it is assumed that SIFT feature is used herein, the feature as sampled point.
Fig. 5 A and Fig. 5 B are the schematic diagrams for illustrating the SIFT feature description of specific example according to embodiments of the present invention.
The gradient information that Fig. 5 A shows at each pixel has been computed, and is added according to the position in circle
Weight.Fig. 5 B shows the accumulation in each vertical bar in feature histogram to gradient.Due to the principle of SIFT feature description
It has been be well known to those skilled in the art that, so omitting the detailed description herein.
It is described using intensive SIFT feature, after the image to target object is sampled and is described, each sampled point
To have following two attributes: 1) feature describes, and is denoted as FD;And 2) location information, it is denoted as (x, y).
In addition to this, in order to establish this using the immanent structure relationship in sample image between target object various pieces
Space correlation inside target object, it is also necessary to further obtain the following property of each sampled point: 3) closest structural portion
Part and its distance.
In the following, for convenience, the example using pedestrian as target object is described the closest structure member and
The acquisition of its range information.It is clear that the invention is not restricted to this.Target object also may include vehicle, building, guardrail etc.
Other objects.
Fig. 6 is to illustrate the pedestrian that body part is marked of specific example according to embodiments of the present invention and showing for sampled point
It is intended to.
Fig. 6, which is shown, performs the schematic diagram of the pedestrian of above-mentioned closest structure member marking operation.As shown in fig. 6,
Body part is indicated by solid dot, and a sampled point is indicated by circle.For example, in Fig. 6, for pedestrian detection, mark
Left shoulder component (being denoted as LS), right shoulder component (being denoted as RS) and the head piece (being denoted as H) etc. of the pedestrian are remembered.
Therefore, by calculating the distance between some characteristic point and each structure member, judge in mesh with can be convenient
Marking which structure member in object is the structure member the most neighbouring with this feature point, and records the closest structural portion
Part and its distance, the adeditive attribute information as this feature point.
As shown in fig. 6, the closest body part of the sampled point indicated by circle is the left shoulder component LS of the pedestrian.
In step S203, improved visual dictionary model is constructed based on sampling point feature.
It specifically, can be to the feature of extraction after extracting the various features of the target object at each sampled point
It is clustered, by position and place position of the feature on target object, assembles significant classification out, and establish between them
Association, obtains the visual dictionary F3 of structuring.In the visual dictionary, may include can describe a large amount of of characteristics of image
Vision entry.
Fig. 7 is the detailed process for illustrating the visual dictionary model generation step of specific example according to embodiments of the present invention
Figure.
As illustrated in figure 7, which specifically includes:
In sub-step S2031, the feature of extraction is clustered.
Specifically, after receiving the characteristic point in training sample image, the position of characteristic point can not be considered temporarily
Information, and according to the description information of characteristic point, to be clustered to these characteristic points, to create visual dictionary.For example, this is sub
Step is similar to the sorting procedure in traditional visual dictionary model, therefore, can be calculated herein using any tradition cluster
Method clusters sampled point.Hereinafter it is assumed that by k nearest neighbor (KNN) algorithm with herein.
Fig. 8 A to Fig. 8 D is the schematic diagram for illustrating the k nearest neighbor algorithm of specific example according to embodiments of the present invention.
In Fig. 8 A to Fig. 8 D, each square dot represents the multidimensional obtained at each sampled point (or characteristic point)
Feature description.For example, the clustering method general as one, k nearest neighbor algorithm can be used in here.Three circles in Fig. 8 A
Point (being located at upper left, lower-left and bottom-right location) represents initial cluster centre, can be realized by spreading at random a little.
Then, by iterative process shown in Fig. 8 B and 8C, more accurate cluster centre is gradually found, to obtain final cluster knot
Fruit, as in fig. 8d.Since the principle of k nearest neighbor algorithm has been well known to those skilled in the art, so omitting it herein
Detailed description.
Although being described by taking k nearest neighbor algorithm as an example above, the invention is not limited thereto.For example, sorting procedure is also
It may include partitioning (Partitioning Methods), stratification (Hierarchical Methods), based on density
Method (density-based methods), method (grid-based methods), the method based on model based on grid
(Model-Based Methods) etc..Clustering algorithm may include K-MEANS algorithm, K-MEDOIDS algorithm, CLARANS calculation
Method, BIRCH algorithm, CURE algorithm, CHAMELEON algorithm, DBSCAN algorithm, OPTICS algorithm, DENCLUE algorithm, STING are calculated
Method, CLIQUE algorithm, WAVE-CLUSTER algorithm etc., these are all clustering algorithms mature in the prior art, herein not one by one
Citing.
In this way, clustering by the description information to these characteristic points, multiple similar description information quilts can be made
It clusters into a vision entry, and cluster obtains multiple vision entries from all description informations of all characteristic points, with
Form visual dictionary.
It simply illustrates, it is assumed that the description information of a characteristic point a includes for example round, large red, another characteristic point b
Description information include for example round, blue, and the description information of another characteristic point c includes for example rectangular, peony.That
, all description informations can be clustered through the above steps to obtain several vision entries: for example round, rectangular, red,
Blue, to form visual dictionary.It should be noted that is illustrated herein is that two distinct types of description information merges one
The cluster example risen can also actually be merged using a seed type or more than two kinds of description information certainly
Cluster.
In sub-step S2032, according to the location information of sampled point and body part information, to calculate in a upper sub-step
The distribution situation for gathering each classification in S2031, finds the classification of aggregation, as significant classification.
Fig. 9 A to Fig. 9 D is the signal for illustrating the significant classification generation sub-step of specific example according to embodiments of the present invention
Figure.
Fig. 9 A to Fig. 9 D shows the example that significant classification generates sub-step.Here, for the image of target pedestrian
In sampled point, as shown in the circle in Fig. 9 A, its corresponding characteristic point is clustered such as Fig. 9 B institute in traditional cluster process
Three partial categories shown (include: classification positioned at left part, using most dark expression;Positioned at right part, utilization
The classification of darker expression;Be located below part, classification using most light expression) among below in partial category.
For the characteristic point shown in the arrow for being directed toward Fig. 9 B from Fig. 9 A, this point has body part information, location information and description
These three properties of information, it is assumed that it is denoted as LS (left shoulder component), (x, y) and FD respectively.And then it promotes it is found that this is clustered
In each point all there are these three attribute informations.Then, according to above- mentioned information, all the points in class can be assembled for this,
Calculate they position and place body part distribution situation, and judge whether its distribution forms a scheduled two dimension
Distribution surface.
For example, if it is judged that the shape of the distribution meets predetermined distribution, for example similar is a Gaussian Profile, wherein greatly
Majority point have the maximum of points for comparing aggregation, as shown in the distribution 1 in Fig. 9 C, then we can define such classification
Do significant classification (or referred to as first kind vision entry).
Alternatively, in the traditional characteristic class in the right part classification among three parts classification as shown in Figure 9 B
All sampled points, it can be seen that their property respectively include: LA, (x, y), FD;LUL,(x,y),FD;LW, (x, y), FD etc.
Deng.Here, LA represents left ankle, and LUL represents left upper arm, and LW represents left finesse.(x, y) is the generalization of their location informations
It indicates, is actually different position coordinates in the picture, such as (x can be denoted as respectively1,y1)、(x2,y2)、(x3,y3)。
So, all the points for including in for this aggregation class calculate their position and dividing for place body part
After cloth situation, it can be determined that go out their position and the distribution situation of place body part does not meet Gaussian Profile, on the contrary,
The shape of this curved surface is more mixed and disorderly, the maximum of points that do not concentrate especially, and as shown in the distribution 2 in Fig. 9 D, then we can be with
It defines such classification and is called non-significant classification (or referred to as second class vision entry).
In sub-step S2033, based on the inherence measurement between significant classification, the association between significant classification is established.
Specifically, the inherent measurement between significant classification can be recent topology component corresponding with the first significant classification and
Immanent structure distance of the recent topology component corresponding with the second significant classification in the target object.For example, by pedestrian
In the case where target object, skeleton distance can be used as one of metric form.For another example, using vehicle as object
In the case where body, the distance between critical component (such as, car light, wheel, AB column) can be used as this metric form.
Figure 10 A and Figure 10 B are to illustrate the significant category associations of specific example according to embodiments of the present invention to establish sub-step
Schematic diagram.
Figure 10 A and Figure 10 B show the example that significant category associations establish sub-step.In Figure 10 A, identify
The framework information of pedestrian, the different gray scales of significant classification of different body parts (for example, head, left shoulder and right shoulder)
Real point identifies out.The significant classification of two of them individually identifies out in fig. 1 ob, respectively represents head (H) and left shoulder
Wing (LS) the two body parts, as shown in figure middle and upper part and following curve arrow.Association between the two significant classifications, can
To be established by the inherent measurement between them.This inherent measurement can be indicated by weighted value.For example, between them
A kind of calculation of weight calculation is as shown in formula 1:
wH,LS=e(-β*idis(H,LS))Formula 1
Wherein, wH,LSIt indicates that significant classification H and relevance of the significant classification LS in spatial relationship, β indicate regulatory factor, comes
From in empirical value, H indicates the position where the body part of head, and LS indicates the position where left shoulder body part, idis (H,
LS the inherent distance between H and LS) is indicated, for example, skeleton distance etc..For example, this inherence distance can be by being marked
Different body parts between pixel distance calculate.
In addition, in one embodiment, we are established between them the immediate two significant classifications that can only adjust the distance
Association;It is of course also possible to establish the association between the significant classification of all significant classifications or part, this is all by actually detected
Demand determines to determine, or by the internal structural characteristic of object, naturally it is also possible to based on it is other it is some consider (for example,
The processing capacity etc. of whole system).
In this way, S203 through the above steps, the visual dictionary model of structuring is just set up, in this visual dictionary
In model, not only there is the entry for feature obtained based on traditional clustering method, also have based on position and body part information
Obtained significant classification, and based on inherent measurement, establishes the relationship between significant classification, strengthen inside visual dictionary
Relevance in spatial relationship.
That is, this feature model generating method can be measured when generating visual dictionary model by inherence,
Establish the association between each significant classification.
It is each Feature Points Matching one or more vision entry, and pass through spatial vision dictionary in step S204
With algorithm, so that the description method of sample image is converted to the weight of the view-based access control model entry with spatial information by pixel data
Characteristic model.
Specifically, after the visual dictionary of structuring has obtained, range information and internal structural information be can use, it will
The feature of target object is mapped on the visual dictionary for establish internal correlation, obtain each examined object based on view
Feel that the feature of bag of words describes F4.
It should be noted that this feature point matching process can be simultaneously applied to including positive sample and negative sample
All training samples.Also, the matching of vision entry and the matching of visual dictionary are different, and the matching of vision entry indicates special
Levy point description information situation similar with vision entry, and the matching of visual dictionary then be foundation have spatial information based on
The characteristic model of the weight of vision entry.
Generally speaking, the purpose of visual dictionary matching process is the feature representation based on characteristic point (for example, description letter
Breath) a kind of feature representation of new more useful view-based access control model bag of words with spatial information is transformed into (for example, can be
Eigenmatrix or characteristic model etc.).It specifically, can be by the Feature Mapping of sampled point each in sample image to generated
On structuring visual dictionary, to form unified description to target object.When matching during mapping,
The internal relation of structuring visual dictionary also can take into account.
Figure 11 is the specific flow chart for illustrating the visual dictionary matching step of specific example according to embodiments of the present invention.
As illustrated in Figure 11, which specifically includes:
In sub-step S2041, matched vision entry is searched from visual dictionary model.
After the location information for the characteristic point for receiving sample image and description information, it can be found from visual dictionary
One or more vision entries that each characteristic point in sample can be characterized, as matched vision entry.For example, continuing with above-mentioned
For the example referred to, the matched vision entry of a characteristic point a can be for example round, red, although this feature point a sheet
The description information of body is round, large red.
Specifically, it is possible, firstly, to calculate between each vision entry in the description information and visual dictionary of characteristic point
Similarity.The similarity can use distance measure fdis (fi,Ck) indicate comprising but it is not limited to Euclidean distance.
Here, fiIt is the characterization information of characteristic point i.CkIt is the characterization information of k-th of vision entry in visual dictionary.Away from
From measured value fdis (fi,Ck) smaller, it can indicate that characteristic point and corresponding vision entry similarity are higher.
It is then possible to select most like vision entry for each characteristic point.In the description information that obtains current signature point and
After the similarity of all vision entries, the purpose of the operation is for the matched one or more visions of spatial encoding process selection
Entry.When being characterized a little only one most similar matched vision entry of selection, it is properly termed as hard decision.Soft-decision be each
Characteristic point selects more than one matched vision entry.
By taking soft decision as an example, finally, available court verdict, which may include similarity and each spy
Levy the corresponding most like one or more vision entries of point, as matched one or more vision entries.
Figure 12 A and Figure 12 B are the matching vision entry lookup sub-steps for illustrating specific example according to embodiments of the present invention
Schematic diagram.
As illustrated in figs. 12 a and 12b, for a sampled point in target image as shown in the circle in Figure 12 A, it is corresponding
Shown in the histogram of feature vector as indicated with an arrow, and it is denoted as fi.In addition, structuring visual dictionary is as shown in Figure 12 B, the view
Feel that each vision entry in dictionary model also can use the expression of its feature vector, and is denoted as Ck.Vision is matched searching
When entry, as set forth above, it is possible to find in structuring visual dictionary immediate one or more according to the similitude of feature
A vision entry class, for example, its similitude between characteristic point is greater than one or more vision entry classes of preset threshold.
In sub-step S2042, the feature weight that characteristic point is mapped on matching vision entry is calculated.
After determining one or more vision entries with Feature Points Matching, these visual words can be further judged
The classification of item.
In the first case, in the one or more matching vision entry class, if some vision entry is non-aobvious
Classification is write, then sampling point feature fiIt is mapped to vision entry CkOn weighted value be exactlyHere β is
Regulatory factor, from empirical value;fdis(fi,Ck) it is sampling point feature fiWith vision entry feature CkThe distance between.
In one example, can according in Conventional visual dictionary model calculate distance method come calculate between feature away from
From fdis (fi,Ck).For example, a kind of calculation that the distance between feature calculates is as shown in formula 2:
Formula 2
Wherein, fdis (fi,Ck) indicate sampling point feature fiWith vision entry feature CkThe distance between, fiIt is characteristic point i
Characterization information, CkIt is the characterization information of k-th of vision entry in visual dictionary, and D is feature vector
Dimension.
In a second situation, if some vision entry is significant classification, in addition to being based on sampled point Feature Mapping
Except this vision entry arrived, it is also necessary in view of in space structure there are other associated classifications to be also required to carry out with it
Mapping, and the associated weights between the weight and classification mapped have relationship.
Figure 13 A and Figure 13 B is to illustrate the feature weight mapping sub-step of specific example according to embodiments of the present invention to show
It is intended to.
As shown in figures 13 a and 13b, it is assumed that sampling point feature fiMatched significant classification l, and significant classification l and
Between significant classification j, significant classification k, pass through weight wl,jAnd wl,kEstablish association.Here, weight wl,jAnd wl,kCalculation method
The significant category associations in structuring visual dictionary generation step 203, which have been established, has carried out detailed Jie in sub-step 2033
It continues, and therefore omits the detailed description.
In this case, when calculating samples point feature fiWhen projection to visual dictionary, projection may be defined that
Weighted value with to significant classification l sheet isAnd via significant classification l and significant classification j, significant classification k
Between relevance, the weighted value projected on significant classification j isAnd it projects on significant classification k
Weighted value be
As shown in Figure 13 B, it is assumed that one shares 7 vision entries, each vision in structuring visual dictionary generated
Entry has the sampled point feature weight being mapped to above it, and these sampled points are entirely from piece image, i.e. sample graph
Picture.So, by above-mentioned similar mode, this feature model generating method can depend on significant classification and non-significant classification,
Each vision entry of each low-level image feature in sample image into visual dictionary is mapped, is retouched with generating final feature
It states, so that executing map operation in the case where adequately taking into account the associated situation between significant classification.
As described above, can be regarded for the feature of sampled point in structuring in sub-step S2041 and S2042 above
Feel and searched in dictionary, to find most similar several or multiple entries, and calculates sampled point and project to these entries
On weight and numerical value.Therefore, the two sub-steps traditionally, can be referred to as to coding (coding) step.
In sub-step S2043, the base of the sample image is generated based on mapped feature weight on each vision entry
In the characteristic model of vision bag of words.
It specifically, next, can be by sampled point Feature Mappings all in a width image to be detected to significant class, non-significant
The weight of class, which is adopted, to be combined by some way or considers, and the unified feature description of this examined object is obtained.
Figure 14 A and Figure 14 B is to illustrate the characteristic model of specific example according to embodiments of the present invention to generate sub-step and show
It is intended to.
As shown in Figure 13 B above, it is assumed that one shares 7 vision entries in structuring visual dictionary generated, often
A vision entry has the sampled point feature weight being mapped to above it, these sampled points are entirely from piece image, i.e. sample
Image, and as shown in the square block in Figure 14 A.It therefore, can by being cascaded to the feature description on each vision entry
To obtain the unified feature description that diagram picture is mapped on this structuring visual dictionary, such as the feature weight in Figure 14 B
Shown in histogram, significant classification vision entry therein has correlation in spatial relationship.Here, it is indicated by arrow aobvious
Write interrelated and between them the weight between classification.
Specifically, for the characteristic model of the target image with spatial information, in multiple vision entries
One of vision entry, for example for classification j, a kind of calculation such as formula 3 that its feature weight value size calculates
It is shown:
Formula 3
Wherein, HjIndicate that mapped feature weight on j-th of vision entry in the visual dictionary model, n1 indicate
Have in spatial relationship when j-th of vision entry is significant classification (first kind) the vision entry, with j-th of vision entry
One vision entry of relevant property is mapped to the feature point number of j-th of vision entry, and T is with j-th of vision entry in sky
Between vision entry in relationship with relevance total number, n2 indicates that when j-th of vision entry be the non-significant classification (the
Two classes) vision entry when, be mapped to the feature point number of j-th of vision entry, ftIndicate the description information of t-th of characteristic point, Ci
Indicate the description information of i-th of vision entry, fdis (ft,Ci) indicate spy between t-th of characteristic point and i-th of vision entry
Levy distance, wi,jIndicate i-th of vision entry and relevance of j-th of vision entry in spatial relationship, fmIndicate m
The description information of a characteristic point, CjIndicate the description information of j-th of vision entry, fdis (fm,Cj) indicate m-th characteristic point with
Characteristic distance between j-th of vision entry, and β 1 and β 2 are pre-determined factors.
As described above, in sub-step S2043 above, can by all sampling point features in a width sample image to
It is mapped on each vision entry in visual dictionary model, to obtain visual dictionary feature.It therefore, can be with traditionally
This sub-steps is referred to as ballot (polling) step.
In step S205, it is trained by all collected characteristic models of all training sample images to obtain
Classifier.
It, can in obtaining training set after the visual dictionary characteristic model of all sample images (positive sample and negative sample)
To train classifier F5 using these visual dictionary features, thus according to the view-based access control model word with spatial information of sample image
The characteristic model of the weight of item, obtains positive sample image (image comprising target object) and negative sample image (does not include target
The image of object) classification benchmark so that in subsequent processing, in online M frame (wherein, M is natural number) image
Each frame image, carry out complete feature extraction and visual dictionary matching after, using trained classifier come to object
It is detected or is identified.
Specifically, classifier is to be trained based on training data, and be used to realize wait know in practical applications
The classification of other target.Training data is the character modules for the view-based access control model bag of words that training sample image concentrates all sample images
Type.For example, classifier can use existing algorithm: such as vector machine, Adaboost classifier, Bayes classifier, BP nerve
Network classifier, decision Tree algorithms, support vector machines (SVM) algorithm etc. will not repeat them here.These sorting algorithms are all
Existing for tradition, in traditional technology, the sample graph including positive sample (classification results are positive) and negative sample (classification results are negative)
The characteristic model of picture can be by obtaining above-mentioned classifier based on the classification based training of above-mentioned algorithm.
Above-mentioned steps S201 to S205 is the step of establishing characteristic model and obtain classifier by study.Following steps
Description is inputted region to be identified by S206 to S210, the visual dictionary and classifier obtained through the above steps come carry out identification or
The step of classification.
In step S206, image to be detected is inputted.
In step S207, the attribute information for obtaining characteristic point is merged to the collection that image to be detected extracts characteristic point.
It is each Feature Points Matching one or more vision entry, and pass through spatial vision dictionary in step S208
With algorithm, so that the description method of sample image is converted to the weight of the view-based access control model entry with spatial information by pixel data
Characteristic model.
Since step S206 to S208 and step S201, S202 and S204 for having been described in above are closely similar,
Difference is only in that goal image is at least one image to be detected of its unknown classification, rather than is known class again
Sample image, so omitting the detailed description herein.
In addition, in order to save process resource, when obtaining the attribute information of characteristic point in step S207, the attribute information
It can only include: 1) feature description, be denoted as FD;And 2) location information, it is denoted as (x, y), without determining this feature point again
Closest structure member and distance between the two.
In step S209, image to be detected is detected using trained classifier.
Specifically, trained classifier (for example, SVM classifier) can be based on specific classification benchmark, according to
The characteristic model with spatial information of the image to be detected to be classified, to judge whether it includes target object.
In step S210, output test result.
For example, finally obtained testing result can perform well in pedestrian detection in in-vehicle camera image, vehicle inspection
Survey etc..Obviously, this pedestrian's identification, the detection and identification of vehicle detection and other targets are that driver assistance drives system
Essential critical function in system and vehicle automatic navigation system, this is because avoiding knocking surrounding for driver
Object is very important the safe driving of driver.
However, needn't elaborate any further, what the specific example of the embodiment of the present invention proposed utilizes a kind of improved visual dictionary
Model carries out this mode of object detection, can not only use in vehicle-mounted camera, can also use other a variety of applications
In.For example, object detection is also particularly significant problem for monitoring field, for example subway other than vehicle driving field
It stands neighbouring effective monitoring, the automatic monitoring in parking lot or the monitoring of supermarket use etc..
It can be seen that the specific example of the embodiment of the present invention provides a kind of characteristic model generation method, it can be right
Each sample in training set is sampled and is extracted after the foundation characteristic of each sampled point, is changed to visual dictionary model
Into so that being measured by inherence when generating visual dictionary model, establishing the association between each significant classification;And
So that low-level image feature is mapped to visual dictionary described with generating final feature when, still consider between significant class
Association mapped, and it is last classification based training is carried out to obtained feature description again, to be utilized improved vision
The final object detection result of dictionary model.
In addition, it should be noted that, although mentioned in the present invention is image, picture etc., but it is understood that regarding
In the case where frequency, above-mentioned objective classification method can also be carried out using one or more frames of video as above-mentioned image, picture.
3, characteristic model generating means
The embodiment of the present invention can also be implemented by a kind of characteristic model generating means.It hereinafter, will be with reference to figure
15 describe the functional configuration block diagrams of characteristic model generating means according to an embodiment of the present invention.
Figure 15 is the functional configuration block diagram for illustrating characteristic model generating means according to an embodiment of the present invention.
As shown in figure 15, this feature model generating means 100 may include: feature extraction unit 101 and visual dictionary
With unit 103.
This feature extraction unit 101 can be used for obtaining at least one characteristic point in the target image and obtain each feature
The location information and description information of point;And
The visual dictionary matching unit 103 can be used for for each characteristic point, the description information based on the characteristic point come
It is searched in visual dictionary model and matches vision entry at least one of the Feature Points Matching, wherein the visual dictionary
Model includes first kind vision entry and the second class vision entry, in the first kind vision entry, vision entry with
At least one other vision entry has relevance in spatial relationship;For each matching visual word with the Feature Points Matching
Item determines at least one mapping objects vision entry according to the classification of the matching vision entry, and at least based on described
The description information of characteristic point and the description information of the matching vision entry are mapped to each mapping objects to calculate the characteristic point
Feature weight on vision entry;And based on mapped feature weight on each vision entry in the visual dictionary model
To generate the characteristic model of the target image with spatial information.
In one embodiment, the target image may include at least one sample image of its known classification, or
The target image is image to be detected for its unknown classification, and when the target image is the sample image,
Feature extraction unit 101 can be also used at least one knot that the target object marked in advance is obtained in the sample image
The location information of structure component;And the spy is determined according to the location information of the location information of each characteristic point and each structure member
The nearest component information of point is levied, the nearest component information indicates the knot of the target object nearest with the characteristic point distance
Structure component.
In addition, this feature model generating means 100 can also include: visual dictionary model generation unit 102.
The visual dictionary model generation unit 102 can be used for the letter of the position based on each characteristic point in the sample image
Breath, description information and nearest component information generate the visual dictionary model.
In one embodiment, which can be executed by following operation based on described
Location information, description information and the nearest component information of each characteristic point in sample image generates the visual dictionary model:
All characteristic points in the sample image are clustered according to description information, to generate the institute for including multiple vision entries
State visual dictionary model;Based on the location information of the characteristic point in each vision entry and nearest component information come by the visual word
Item is divided into the first kind vision entry and the second class vision entry, wherein the spy in the first kind vision entry
The distribution for levying the position and recent topology component of point meets predetermined distribution;And in the first kind vision entry, based on view
Feel the inherent measurement between entry to establish a vision entry and pass of at least one other vision entry in spatial relationship
Connection property.
In a specific example, which can be executed by following operation based on view
Feel the inherent measurement between entry to establish a vision entry and pass of at least one other vision entry in spatial relationship
Connection property: existed based on and the corresponding recent topology component of First look entry and recent topology component corresponding with the second vision entry
Immanent structure distance in the target object is closed to calculate the First look entry and the second vision entry in space
Relevance in system.
In one embodiment, which can be executed by following operation and be based on the feature
The description information of point, which is searched in visual dictionary model, matches vision entry at least one of the Feature Points Matching: according to
The description information of each vision entry calculates the characteristic point and institute in the description information and visual dictionary model of the characteristic point
State the similitude between vision entry;And the vision entry that its similitude is greater than or equal to predetermined threshold is determined as described
With vision entry.
In one embodiment, which can be executed by following operation according to the matching
The classification of vision entry determines at least one mapping objects vision entry: judging that the matching vision entry is the first kind
Vision entry or the second class vision entry;When the matching vision entry is the first kind vision entry, in institute
It states to search in visual dictionary model and matches vision entry in spatial relationship at least one other view of relevance with described
Feel entry;And the matching vision entry itself and at least one other vision entry are determined as the mapping objects
Vision entry;And when the matching vision entry is the second class vision entry, only by the matching vision entry
Itself it is determined as the mapping objects vision entry.
In one embodiment, when the mapping objects vision entry is the matching vision entry itself, the vision
Dictionary matching unit 103 can execute the description information and the matching vision at least based on the characteristic point by following operation
The description information of entry calculates the feature weight that the characteristic point is mapped on each mapping objects vision entry: based on the spy
Sign point matches the characteristic distance between vision entry and calculates the characteristic point and be mapped to the matching vision entry sheet with described
Feature weight with it.
Alternatively, when the mapping objects vision entry is other described vision entries, the visual dictionary matching unit
103 descriptions that description information and the matching vision entry at least based on the characteristic point can be executed by following operation are believed
Breath is to calculate the feature weight that the characteristic point is mapped on each mapping objects vision entry: based on the characteristic point and described
With between vision entry characteristic distance and the matching vision entry and other the described passes of vision entry in spatial relationship
Connection property calculates the feature weight that the characteristic point is mapped on other described vision entries.
In one embodiment, which can be executed by following operation and be based on the vision
Mapped feature weight generates the spy of the target image with spatial information on each vision entry in dictionary model
It levies model: being directed to each vision entry, calculate when the vision entry is the first kind vision entry, in the target image
Each characteristic point be mapped to the fisrt feature weight on the vision entry, calculate when the vision entry is the second class view
Feel that each characteristic point when entry, in the target image is mapped to the second feature weight on the vision entry, and to institute
It states fisrt feature weight and the second feature weight is summed, to generate mapped general characteristic on the vision entry
Weight;And mapped general characteristic weight on each vision entry in the visual dictionary model is cascaded, with life
At the characteristic model of the target image with spatial information.
Above-mentioned feature extraction unit 101, visual dictionary model generation unit 102 and visual dictionary matching unit 103
Concrete function and operation have been described above in the characteristic model generation method referring to figs. 1 to Figure 14 description and are discussed in detail, and therefore,
It will omit its repeated description.
It should be noted that the component of features described above model generating means 100 can be realized with software program, such as logical
CPU combination RAM and ROM for crossing in general purpose computer etc. and the software code wherein run are realized.Software program can deposit
Storage is loaded on such as random access storage device RAM at runtime on the storage mediums such as flash memory, floppy disk, hard disk, CD
To be executed by CPU.In addition, in addition on general purpose computer, it can also be by the cooperation between specific integrated circuit and software come real
It is existing.The integrated circuit include for example, by MPU (microprocessing unit), DSP (digital signal processor), (scene can compile FPGA
Journey gate array), at least one of ASIC (specific integrated circuit) etc. realizes.Such general purpose computer is dedicated integrated
Circuit etc. can be for example loaded on specific position (for example, vehicle), and with installation on location for road
It is communicated with the imaging device such as camera of the associated object imaging of road, so as to the two dimension for obtaining camera shooting
Image and stereo-picture generate the characteristic model with spatial information as target image.In addition, this feature model generates dress
Setting 100 all parts can be realized with special hardware, such as specific field programmable gate array, specific integrated circuit
Deng.In addition, all parts of characteristic model generating means 100 can use the combination of software and hardware also to realize.
4, vehicle control system
It is also possible to be applied to a kind of vehicle control system, for realizing the row in in-vehicle camera image
People's detection, vehicle detection etc..Hereinafter, vehicle control system according to an embodiment of the present invention will be described with reference to Figure 16
Functional structure.
Figure 16 is the functional structure chart for illustrating vehicle control system according to an embodiment of the present invention.
Specifically, Figure 16 shows a vehicle control system 300, using proposed by the present invention based on structure member
Improved visual dictionary model carries out the method for object detection to realize the automatic control of vehicle.
The vehicle control system 300 is mounted on vehicle 1000.The system 300 includes two camera 310, one figures
As processing module 320 and a vehicle control module 330.Camera 310 may be mounted at close to the position of vehicle mirrors,
For capturing the scene in 1000 front of vehicle.The image of the vehicle front scene captured will be as image processing module 320
Input.The image of the analysis input of image processing module 320, pre-processes them, extracts sampling point feature, establishes structure view
Feel dictionary model, obtains the feature being mapped on improved visual dictionary model.In one embodiment, the image processing module
320 can be classified with further progress judges, to obtain testing result, so that telling target object is pedestrian, vehicle, still
Roadblock, signal lamp etc..Vehicle control module 330 receives the signal exported by image processing module 320, is examined according to obtained object
It surveys as a result, generating control signals to the driving direction and travel speed of control vehicle 1000.
For example, the image processing module 320 can use characteristic model generating means 100 shown in figure 15 cooperate it is traditional
The configuration of classifier is realized.
5, object detection subsystem
Figure 17 is to illustrate the object based on object detection is carried out on in-vehicle camera image with improved visual dictionary model
Detect the internal structure of subsystem.
As shown in figure 17, which is the subset in the illustrated vehicle control system of Figure 16, i.e., it is only
It only include camera 310 and image processing module 320.
Specifically, camera 310 includes an imaging sensor 201 and a camera digital signal processor (DSP)
202.Imaging sensor 201 converts optical signals to electronic signal, and the image in 1000 front of current vehicle of capture is converted to
Analog picture signal, then result is passed to camera DSP 202.If desired, camera 310 can further include mirror
Head, filter etc..For example, in the present system, may include multiple cameras 310, being caught simultaneously after these cameras registration
Obtain multiple image.
Analog picture signal is converted to data image signal by camera DSP 202, and is sent to image processing module
320。
In image processing module 320, image input interface 203 obtains image by predetermined time interval.Depth map at
Picture module 204 utilizes vision or other principles, and a pair of of digital picture of input is converted to depth map.Then depth image is write
Enter memory 206, then is analyzed and handled by program 207.Image procossing herein includes a variety of operations, such as sampling point feature
Calculating, visual dictionary feature calculation etc..The program 207 being loaded into memory (for example, ROM or RAM) 206 executes a series of
It operates to carry out the detection of object.In the process, CPU 205 is responsible for control operation and arithmetic operation, such as is obtained by interface
Data execute image procossing etc..
For example, the program 207 can be used for implementing above-mentioned characteristic model generation method according to an embodiment of the present invention.
Each embodiment of the invention has been described in detail above.However, it should be appreciated by those skilled in the art that not taking off
In the case where from the principle and spirit of the invention, these embodiments can be carry out various modifications, combination or sub-portfolio, and in this way
Modification should fall within the scope of the present invention.
Claims (9)
1. a kind of characteristic model generation method, which is characterized in that the described method includes:
At least one characteristic point is obtained in the target image and obtains the location information and description information of each characteristic point;
For each characteristic point,
Based on the description information of the characteristic point come at least one of lookup and the Feature Points Matching in visual dictionary model
Match vision entry, wherein the visual dictionary model includes first kind vision entry and the second class vision entry, described the
In a kind of vision entry, a vision entry and at least one other vision entry have relevance in spatial relationship;
For each matching vision entry with the Feature Points Matching, determined at least according to the classification of the matching vision entry
One mapping objects vision entry, and the description of the description information and the matching vision entry at least based on the characteristic point
Information calculates the feature weight that the characteristic point is mapped on each mapping objects vision entry;And
It is generated based on mapped feature weight on each vision entry in the visual dictionary model with spatial information
The characteristic model of the target image;
Wherein, include: to determine at least one mapping objects vision entry according to the classification of the matching vision entry
Judge that the matching vision entry is the first kind vision entry or the second class vision entry;
When the matching vision entry is the first kind vision entry, searched and described in the visual dictionary model
With vision entry at least one other vision entry of relevance in spatial relationship;And by the matching vision entry
Itself and at least one other vision entry are determined as the mapping objects vision entry;And
When the matching vision entry is the second class vision entry, only the matching vision entry itself is determined as
The mapping objects vision entry.
2. the method according to claim 1, which is characterized in that the target image includes at least one sample of its known classification
Image or the target image are at least one image to be detected of its unknown classification, and
When the target image is the sample image, the method also includes:
The location information of at least one structure member of the target object marked in advance is obtained in the sample image;And
The nearest component letter of the characteristic point is determined according to the location information of the location information of each characteristic point and each structure member
Breath, the nearest component information indicate the structure member of the target object nearest with the characteristic point distance.
3. method according to claim 2, which is characterized in that the method also includes:
The view is generated based on the location information of each characteristic point in the sample image, description information and nearest component information
Feel dictionary model.
4. according to the method in claim 3, which is characterized in that location information based on each characteristic point in the sample image,
Description information and nearest component information include: to generate the visual dictionary model
All characteristic points in the sample image are clustered according to description information, include multiple vision entries to generate
The visual dictionary model;
The vision entry is divided into institute based on the location information of the characteristic point in each vision entry and nearest component information
State first kind vision entry and the second class vision entry, wherein the position of the characteristic point in the first kind vision entry
Distribution with recent topology component meets predetermined distribution;And
In the first kind vision entry, inherent measurement between view-based access control model entry come establish a vision entry at least
One other relevances of vision entry in spatial relationship.
5. method according to claim 4, which is characterized in that the inherent measurement between view-based access control model entry is to establish a vision
Entry includes: with relevance of at least one other vision entry in spatial relationship
Existed based on and the corresponding recent topology component of First look entry and recent topology component corresponding with the second vision entry
Immanent structure distance in the target object is closed to calculate the First look entry and the second vision entry in space
Relevance in system.
6. the method according to claim 1, which is characterized in that based on the description information of the characteristic point come in visual dictionary model
Middle lookup matches vision entry at least one of the Feature Points Matching and includes:
The spy is calculated according to the description information of vision entry each in the description information of the characteristic point and visual dictionary model
Similitude between sign point and the vision entry;And
The vision entry that its similitude is greater than or equal to predetermined threshold is determined as the matching vision entry.
7. method according to claim 6, which is characterized in that
Description letter when the mapping objects vision entry is the matching vision entry itself, at least based on the characteristic point
Breath calculates the spy that the characteristic point is mapped on each mapping objects vision entry with the description information for matching vision entry
Sign weight includes: match the characteristic distance between vision entry based on the characteristic point with described and calculate characteristic point mapping
Feature weight onto the matching vision entry itself;And
When the mapping objects vision entry is other described vision entries, description information at least based on the characteristic point and
The description information of the matching vision entry is weighed to calculate the feature that the characteristic point is mapped on each mapping objects vision entry
Include: again based on the characteristic point and the characteristic distance matched between vision entry and the matching vision entry with it is described
Other relevances of vision entry in spatial relationship calculate the spy that the characteristic point is mapped on other described vision entries
Levy weight.
8. the method according to claim 1, which is characterized in that based on being reflected on each vision entry in the visual dictionary model
The feature weight penetrated includes: come the characteristic model for generating the target image with spatial information
For each vision entry, calculate when the vision entry is the first kind vision entry, in the target image
Each characteristic point is mapped to the fisrt feature weight on the vision entry, calculates when the vision entry is the second class vision
Each characteristic point when entry, in the target image is mapped to the second feature weight on the vision entry, and to described
Fisrt feature weight and the second feature weight are summed, and are weighed with generating mapped general characteristic on the vision entry
Weight;And
Mapped general characteristic weight on each vision entry in the visual dictionary model is cascaded, is had to generate
The characteristic model of the target image of spatial information.
9. a kind of characteristic model generating means, which is characterized in that described device includes:
Feature extraction unit, for obtaining at least one characteristic point in the target image and obtaining the location information of each characteristic point
And description information;And
Visual dictionary matching unit, for being directed to each characteristic point,
Based on the description information of the characteristic point come at least one of lookup and the Feature Points Matching in visual dictionary model
Match vision entry, wherein the visual dictionary model includes first kind vision entry and the second class vision entry, described the
In a kind of vision entry, a vision entry and at least one other vision entry have relevance in spatial relationship;
For each matching vision entry with the Feature Points Matching, determined at least according to the classification of the matching vision entry
One mapping objects vision entry, and the description of the description information and the matching vision entry at least based on the characteristic point
Information calculates the feature weight that the characteristic point is mapped on each mapping objects vision entry;And
It is generated based on mapped feature weight on each vision entry in the visual dictionary model with spatial information
The characteristic model of the target image;
Wherein, include: to determine at least one mapping objects vision entry according to the classification of the matching vision entry
Judge that the matching vision entry is the first kind vision entry or the second class vision entry;
When the matching vision entry is the first kind vision entry, searched and described in the visual dictionary model
With vision entry at least one other vision entry of relevance in spatial relationship;And by the matching vision entry
Itself and at least one other vision entry are determined as the mapping objects vision entry;And
When the matching vision entry is the second class vision entry, only the matching vision entry itself is determined as
The mapping objects vision entry.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410471391.5A CN105404886B (en) | 2014-09-16 | 2014-09-16 | Characteristic model generation method and characteristic model generating means |
JP2015179850A JP2016062610A (en) | 2014-09-16 | 2015-09-11 | Feature model creation method and feature model creation device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410471391.5A CN105404886B (en) | 2014-09-16 | 2014-09-16 | Characteristic model generation method and characteristic model generating means |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105404886A CN105404886A (en) | 2016-03-16 |
CN105404886B true CN105404886B (en) | 2019-01-18 |
Family
ID=55470362
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410471391.5A Expired - Fee Related CN105404886B (en) | 2014-09-16 | 2014-09-16 | Characteristic model generation method and characteristic model generating means |
Country Status (2)
Country | Link |
---|---|
JP (1) | JP2016062610A (en) |
CN (1) | CN105404886B (en) |
Families Citing this family (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106055704B (en) * | 2016-06-22 | 2020-02-04 | 重庆中科云丛科技有限公司 | Image retrieval and matching method and system |
CN106897675B (en) * | 2017-01-24 | 2021-08-17 | 上海交通大学 | Face living body detection method combining binocular vision depth characteristic and apparent characteristic |
JP6760490B2 (en) * | 2017-04-10 | 2020-09-23 | 富士通株式会社 | Recognition device, recognition method and recognition program |
CN110110145B (en) * | 2018-01-29 | 2023-08-22 | 腾讯科技(深圳)有限公司 | Descriptive text generation method and device |
CN112292690A (en) * | 2018-06-14 | 2021-01-29 | 奇跃公司 | Augmented reality deep gesture network |
CN110728711B (en) * | 2018-07-17 | 2021-11-12 | 北京三快在线科技有限公司 | Positioning and mapping method and device, and positioning method, device and system |
CN109961103B (en) * | 2019-04-02 | 2020-10-27 | 北京迈格威科技有限公司 | Training method of feature extraction model, and image feature extraction method and device |
CN110503093B (en) * | 2019-07-24 | 2022-11-04 | 中国航空无线电电子研究所 | Region-of-interest extraction method based on disparity map DBSCAN clustering |
CN112307809B (en) * | 2019-07-26 | 2023-07-25 | 中国科学院沈阳自动化研究所 | Active target identification method based on sparse feature point cloud |
CN110807437B (en) * | 2019-11-08 | 2023-01-03 | 腾讯科技(深圳)有限公司 | Video granularity characteristic determination method and device and computer-readable storage medium |
US20230037499A1 (en) * | 2020-02-17 | 2023-02-09 | Mitsubishi Electric Corporation | Model generation device, in-vehicle device, and model generation method |
CN112116644B (en) * | 2020-08-28 | 2023-05-23 | 辽宁石油化工大学 | Obstacle detection method and device based on vision and obstacle distance calculation method and device |
CN111967542B (en) * | 2020-10-23 | 2021-01-29 | 江西小马机器人有限公司 | Meter identification secondary positioning method based on depth feature points |
CN112668590A (en) * | 2021-01-05 | 2021-04-16 | 瞬联软件科技(南京)有限公司 | Visual phrase construction method and device based on image feature space and airspace space |
CN113048807B (en) * | 2021-03-15 | 2022-07-26 | 太原理工大学 | Air cooling unit backpressure abnormity detection method |
CN114237046B (en) * | 2021-12-03 | 2023-09-26 | 国网山东省电力公司枣庄供电公司 | Partial discharge pattern recognition method based on SIFT data feature extraction algorithm and BP neural network model |
CN114937204B (en) * | 2022-04-29 | 2023-07-25 | 南京信息工程大学 | Neural network remote sensing change detection method for lightweight multi-feature aggregation |
TWI798094B (en) * | 2022-05-24 | 2023-04-01 | 鴻海精密工業股份有限公司 | Method and equipment for training depth estimation model and depth estimation |
CN115492493A (en) * | 2022-07-28 | 2022-12-20 | 重庆长安汽车股份有限公司 | Tail gate control method, device, equipment and medium |
CN115563654B (en) * | 2022-11-23 | 2023-03-31 | 山东智豆数字科技有限公司 | Digital marketing big data processing method |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102254015A (en) * | 2011-07-21 | 2011-11-23 | 上海交通大学 | Image retrieval method based on visual phrases |
CN102708380A (en) * | 2012-05-08 | 2012-10-03 | 东南大学 | Indoor common object identification method based on machine vision |
CN103440508A (en) * | 2013-08-26 | 2013-12-11 | 河海大学 | Remote sensing image target recognition method based on visual word bag model |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130132377A1 (en) * | 2010-08-26 | 2013-05-23 | Zhe Lin | Systems and Methods for Localized Bag-of-Features Retrieval |
US8849030B2 (en) * | 2011-04-22 | 2014-09-30 | Microsoft Corporation | Image retrieval using spatial bag-of-features |
-
2014
- 2014-09-16 CN CN201410471391.5A patent/CN105404886B/en not_active Expired - Fee Related
-
2015
- 2015-09-11 JP JP2015179850A patent/JP2016062610A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102254015A (en) * | 2011-07-21 | 2011-11-23 | 上海交通大学 | Image retrieval method based on visual phrases |
CN102708380A (en) * | 2012-05-08 | 2012-10-03 | 东南大学 | Indoor common object identification method based on machine vision |
CN103440508A (en) * | 2013-08-26 | 2013-12-11 | 河海大学 | Remote sensing image target recognition method based on visual word bag model |
Non-Patent Citations (1)
Title |
---|
"用于图像场景分类的空间视觉词袋模型";王宇新 等;《计算机科学》;20110831;第38卷(第8期);第265-268页 |
Also Published As
Publication number | Publication date |
---|---|
JP2016062610A (en) | 2016-04-25 |
CN105404886A (en) | 2016-03-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105404886B (en) | Characteristic model generation method and characteristic model generating means | |
CN112380952B (en) | Power equipment infrared image real-time detection and identification method based on artificial intelligence | |
US9846946B2 (en) | Objection recognition in a 3D scene | |
CN105095905B (en) | Target identification method and Target Identification Unit | |
CN104166841B (en) | The quick detection recognition methods of pedestrian or vehicle is specified in a kind of video surveillance network | |
CN108875600A (en) | A kind of information of vehicles detection and tracking method, apparatus and computer storage medium based on YOLO | |
CN107273832B (en) | License plate recognition method and system based on integral channel characteristics and convolutional neural network | |
CN107305635A (en) | Object identifying method, object recognition equipment and classifier training method | |
CN105512683A (en) | Target positioning method and device based on convolution neural network | |
CN105718866B (en) | A kind of detection of sensation target and recognition methods | |
CN104915673B (en) | A kind of objective classification method and system of view-based access control model bag of words | |
CN108257151B (en) | PCANet image change detection method based on significance analysis | |
CN107610177B (en) | The method and apparatus of characteristic point is determined in a kind of synchronous superposition | |
Momin et al. | Vehicle detection and attribute based search of vehicles in video surveillance system | |
CN104036284A (en) | Adaboost algorithm based multi-scale pedestrian detection method | |
Holzer et al. | Learning to efficiently detect repeatable interest points in depth data | |
CN108734200B (en) | Human target visual detection method and device based on BING (building information network) features | |
CN110659550A (en) | Traffic sign recognition method, traffic sign recognition device, computer equipment and storage medium | |
Swadzba et al. | Indoor scene classification using combined 3D and gist features | |
CN108073940B (en) | Method for detecting 3D target example object in unstructured environment | |
CN114049572A (en) | Detection method for identifying small target | |
CN110599463A (en) | Tongue image detection and positioning algorithm based on lightweight cascade neural network | |
CN113033385A (en) | Deep learning-based violation building remote sensing identification method and system | |
CN103093243A (en) | High resolution panchromatic remote sensing image cloud discriminating method | |
CN115620090A (en) | Model training method, low-illumination target re-recognition method and device and terminal equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20190118 Termination date: 20210916 |
|
CF01 | Termination of patent right due to non-payment of annual fee |