CN106529583A - Bag-of-visual-word-model-based indoor scene cognitive method - Google Patents
Bag-of-visual-word-model-based indoor scene cognitive method Download PDFInfo
- Publication number
- CN106529583A CN106529583A CN201610933785.7A CN201610933785A CN106529583A CN 106529583 A CN106529583 A CN 106529583A CN 201610933785 A CN201610933785 A CN 201610933785A CN 106529583 A CN106529583 A CN 106529583A
- Authority
- CN
- China
- Prior art keywords
- scene
- bag
- orb
- image
- words
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Probability & Statistics with Applications (AREA)
- Image Analysis (AREA)
Abstract
The invention, which belongs to the mobile robot environment sensing field, especially relates to a bag-of-visual-word-model-based indoor scene cognitive method. The method comprises an off-line part and an on-line part. At the off-line part, scene types are determined based on an application need; a robot uses a carried RGB-D sensor to scan all scenes to obtaining enough scene images to form an image training set; and an ORB 256-dimensional descriptor of each image in the image training set is generated by using an ORB algorithm, wherein each image includes thousands of ORB vectors usually. At the on-line part, the robot receives a current scene type inquiring instruction; and the system is initialized and is prepared for scene query. With the ORB algorithm, the image pretreatment process including feature extraction and matching is completed, so that the algorithm rapidness can be guaranteed; and the scene identification rate is improved by using a KNN classifier algorithm, so that the demand of common indoor scene inquire application of the mobile robot can be satisfied.
Description
Technical field
The invention belongs to mobile robot environment sensing field, more particularly to a kind of indoor field of view-based access control model bag of words
Scape cognitive approach.
Background technology
Under normal circumstances, grating map can meet robot to navigation, the bottom demand of avoidance task, but for completing
The such as high-rise task of one class of man-machine interaction and mission planning, in addition it is also necessary to obtain with regard to the cognitive semantic information of scene, create face
To cognitive semantic map.Mobile robot is moved in scene indoors, is unaware of itself position and is belonged to parlor, kitchen suppression
Or bedroom, then can not complete similar to taking high intelligent task as bottle mineral water in the refrigerator for the mankind to kitchen.
The content of the invention
It is an object of the invention to propose a kind of indoor scene cognitive approach of view-based access control model bag of words.
The object of the present invention is achieved like this:
The present invention includes offline and online two parts, comprises the following steps that:
Offline part:
(1) determine scene type according to application demand, robot using RGB-D sensor scans each scene carried,
Obtain enough scene image composition training set of images;
(2) the dimension descriptors of ORB 256 in training set of images per piece image are generated using ORB algorithms, each image is led to
Hundreds and thousands of ORB vectors are included often;
(3) ORB characteristic points in training set of images are trained using K-means clustering algorithms, generate K class heart composition
Visual vocabulary, constructs visual dictionary;
(4) for the ORB features of all images, frequency and frequency inverse that each vision word occurs is calculated, by TF-
IDF adds weight to frequency table, generates the vision bag of words of each image of training set of weighting;Preserve visual dictionary and training set
Vision bag of words just obtain the offline semantic map of new model;
Online part:
(5) robot receives the instruction of current scene Query, and system initialization is ready for scenario queries;
(6) robot obtains the RGB image of current scene using the video camera of its carrying, and is detected simultaneously using ORB algorithms
Extract feature point set;
(7) query semantics map data base, compares visual dictionary, generates the weighting vision bag of words mould of current scene image
Type;
(8) the vision bag of words of current scene image are regarded with semantic map data base training set using KNN graders
Feel bag of words contrast, it is final to determine current scene classification, and return Query Result.
Described step (3) includes following sub-step:
(3.1) in feature point set X k sample point of random choose as initial cluster center
(3.2) calculate each characteristic point x in feature point seti(i=1,2 ..., n) to the distance of all cluster centresAnd by characteristic point xiIt is divided into class m closest with whichjIn;
(3.3) calculate the cluster centre of each classJ=1, wherein 2 ..., k, njFor being divided into class cluster mj
Middle feature is counted out, calculating target function Wn(t), and it is poor with a front result of calculation, if Wn(t)-Wn(t-1) < 0, after
Continuous iterative step (3.2), (3.3);Otherwise, iteration is exited, calculating terminates;Using the k cluster centre for obtaining as vision word,
All vision word lists storage is obtained into visual dictionary;
Step (3) visual dictionary word capacity parameter K is set to 900.
In step (8), KNN classifier parameters K are set to 1.
The beneficial effects of the present invention is:
The present invention completes the Image semantic classification process of feature extracting and matching using ORB algorithms, and algorithm rapidity is protected
Card;Scene Recognition rate is improve using KNN classifier algorithms, common scenario queries application need in mobile robot room can be met
Ask.
Description of the drawings
Indoor scene cognitive approach algorithm flow schematic diagrams of the Fig. 1 for view-based access control model bag of words.
Specific embodiment
Below in conjunction with the accompanying drawings the present invention is described further.
The invention discloses a kind of indoor scene cognitive approach of view-based access control model bag of words, the inventive method includes offline
Map is generated and Online Map inquires about two parts.Offline map generation portion includes:Scanning scene obtains scene training set;ORB
Feature detection and description;K mean cluster extracts class heart construction visual dictionary;TF-IDF technologies addition weight generates training set vision
Bag of words data base.Online Map query portion includes:Receive scene query statement;Obtain current scene RGB image and carry
Take ORB features;Inquiry map data base visual dictionary, generates current scene image vision bag of words;KNN graders are contrastively
Chart database training set and current scene bag of words, judge current scene classification.By the way, the present invention can be quick
Mobile robot is helped to complete indoor scene cognition exactly, so as to preferably same human interaction.
For solving the above problems, the present invention proposes the indoor scene cognitive approach of view-based access control model bag of words, so as to set up
Indoor common scene visual dictionary, sets up a kind of new semantic map view cognitive towards indoor scene, is subsequently used for machine
People's indoor scene Query.
For reaching above-mentioned purpose, technical scheme includes following main points:
Offline part:
Step 1. scanning scene obtains scene training set;
Step 2.ORB feature detection and description;
Step 3.K mean cluster extracts class heart construction visual dictionary;
Step 4.TF-IDF technology addition weight generates training set vision bag of words data base;
Online part:
Step 1. obtains current scene RGB image and extracts ORB features;
Step 2. inquiry map data base visual dictionary generates current scene image vision bag of words;
Step 3.KNN grader contrasts map data base training set and current scene bag of words, judges current scene class
Not.
The indoor scene cognitive approach algorithm flow of view-based access control model bag of words is as shown in figure 1, can be divided into offline and online
Two parts, specific implementation step are as follows:
(1) offline map is generated:
Step 1. determines scene type according to application demand, robot using the RGB-D sensor scans for carrying each
Scape, obtains enough scene image composition training set of images.
Step 2. generates the dimension descriptors of ORB 256 in training set of images per piece image, each image using ORB algorithms
Generally comprise hundreds and thousands of ORB vectors.
Step 3. is trained to ORB characteristic points in training set of images using K-means clustering algorithms, generates the K class heart
Composition visual vocabulary, constructs visual dictionary.For indoor 10 or so scene, the scene that K=900 can obtain about 80% is taken
Know quasi- rate, and algorithm possesses good rapidity, so parameter K of the present invention chooses 900.
K-means algorithms are a kind of unsupervised self-adaption cluster parsers, with efficiency high, are adapted at large-scale data
The advantage of reason.Its core concept is in feature point set X={ x1,x2,…,xnIn obtain k cluster centre { m1,m2,…,mk,
The characteristic point in set of characteristic points is met to square distance and the minimum of the affiliated class heart, its object function expression formula is:
Step 3 specifically includes following sub-step:
Step 3.1. in feature point set X k sample point of random choose as initial cluster center
Step 3.2. calculates each characteristic point x in feature point seti(i=1,2 ..., n) to the distance of all cluster centresAnd by characteristic point xiIt is divided into class m closest with whichjIn;
Step 3.3. calculates the cluster centre of each classJ=1, wherein 2 ..., k, njFor being divided into class cluster
mjMiddle feature is counted out, according to formula (1) calculating target function Wn(t), and it is poor with a front result of calculation, if Wn(t)-Wn(t-
1) < 0, continues iterative step 3.2,3.3;Otherwise, iteration is exited, calculating terminates.Using the k cluster centre for obtaining as vision
All vision word lists storage is obtained visual dictionary by word.
ORB feature of the step 4. for all images, calculates frequency (TF) and frequency inverse that each vision word occurs
(IDF) weight is added to frequency table by TF-IDF, the vision bag of words of each image of training set of weighting are generated.Preservation is regarded
Feel that dictionary and training set vision bag of words just obtain the offline semantic map of new model.
After visual dictionary is obtained, so that it may which the vision word frequency histogram for obtaining image using visual dictionary Jing statistics is retouched
State.For each width training image and test image, the numerous low-level image features for extracting acquisition are entered with the word in visual dictionary
Row matching, finds immediate one and replaces description, finally count the number of times that each word occurs, and just obtains image based on frequency
The histogrammic vision bag of words of number are represented.
Hypothesis visual dictionary is { m1,m2,…,mk, ORB low-level image features and each vision list are calculated using nearest neighbor algorithm
Euclidean distance between word, so as to by feature viReplace description with his nearest vision word, as shown in formula (2).
(2) Online Map inquiry:
Step 1. robot receives the instruction of current scene Query, and system initialization is ready for scenario queries.
Step 2. robot obtains the RGB image of current scene using the video camera of its carrying, and is detected using ORB algorithms
And extract feature point set.
Step 3. query semantics map data base, compares visual dictionary, generates the weighting vision bag of words of current scene image
Model.
Step 4. adopts KNN graders by the vision bag of words of current scene image and semantic map data base training set
Vision bag of words are contrasted, final to determine current scene classification, and return Query Result.
The basic thought of KNN algorithms can be expressed as:Calculate current scene vision bag of words undetermined and each vision of training set
The similarity of bag of words, finds out each samples of most like K, determines current scene vision according to the category vote result of this K sample
Classification.Here similarity measurement adopts Euclidean distance, two n-dimensional vector a=(x11,x12,…,x1n) and b=(x21,
x22,…,x2n) Euclidean distance be:
Expressed with the form of vector operation, then:
Jing is tested, and KNN parameters K are elected 1 or 3 as and there is higher scene to know quasi- rate, and KNN parameters K of the present invention select 1.
Claims (4)
1. a kind of indoor scene cognitive approach of view-based access control model bag of words, it is characterised in that including offline and online two portions
Point, comprise the following steps that:
Offline part:
(1) scene type is determined according to application demand, robot is obtained using each scene of the RGB-D sensor scans of carrying
Enough scene image composition training set of images;
(2) the dimension descriptors of ORB 256 in training set of images per piece image are generated using ORB algorithms, each image is generally wrapped
Containing hundreds and thousands of ORB vectors;
(3) ORB characteristic points in training set of images are trained using K-means clustering algorithms, generate the K class heart and constitute vision
Vocabulary, constructs visual dictionary;
(4) for the ORB features of all images, frequency and frequency inverse that each vision word occurs is calculated, by TF-IDF
Weight is added to frequency table, the vision bag of words of each image of training set of weighting are generated;Preserve visual dictionary and training set is regarded
Feel that bag of words just obtain the offline semantic map of new model;
Online part:
(5) robot receives the instruction of current scene Query, and system initialization is ready for scenario queries;
(6) robot obtains the RGB image of current scene using the video camera of its carrying, and is detected and extracted using ORB algorithms
Feature point set;
(7) query semantics map data base, compares visual dictionary, generates the weighting vision bag of words of current scene image;
(8) KNN graders are adopted by the vision bag of words of current scene image and semantic map data base training set visual word
Bag model is contrasted, final to determine current scene classification, and returns Query Result.
2. the indoor scene cognitive approach of a kind of view-based access control model bag of words according to claim 1, it is characterised in that:Institute
The step of stating (3) includes following sub-step:
(3.1) in feature point set X k sample point of random choose as initial cluster center
(3.2) calculate each characteristic point x in feature point seti(i=1,2 ..., n) to the distance of all cluster centresAnd by characteristic point xiIt is divided into class m closest with whichjIn;
(3.3) calculate the cluster centre of each classWherein njFor being divided into class cluster mjMiddle feature
Count out, calculating target function Wn(t), and it is poor with a front result of calculation, if Wn(t)-Wn(t-1) < 0, continues iteration
Step (3.2), (3.3);Otherwise, iteration is exited, calculating terminates;Using the k cluster centre for obtaining as vision word, will be all
Vision word list storage obtains visual dictionary;
3. the indoor scene cognitive approach of a kind of view-based access control model bag of words according to claim 1, it is characterised in that:Institute
State step (3) visual dictionary word capacity parameter K and be set to 900.
4. the indoor scene cognitive approach of a kind of view-based access control model bag of words according to claim 1, it is characterised in that:Institute
In stating step (8), KNN classifier parameters K are set to 1.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610933785.7A CN106529583A (en) | 2016-11-01 | 2016-11-01 | Bag-of-visual-word-model-based indoor scene cognitive method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610933785.7A CN106529583A (en) | 2016-11-01 | 2016-11-01 | Bag-of-visual-word-model-based indoor scene cognitive method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106529583A true CN106529583A (en) | 2017-03-22 |
Family
ID=58291890
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610933785.7A Pending CN106529583A (en) | 2016-11-01 | 2016-11-01 | Bag-of-visual-word-model-based indoor scene cognitive method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106529583A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107167144A (en) * | 2017-07-07 | 2017-09-15 | 武汉科技大学 | A kind of mobile robot indoor environment recognition positioning method of view-based access control model |
CN107220932A (en) * | 2017-04-18 | 2017-09-29 | 天津大学 | Panorama Mosaic method based on bag of words |
CN108256463A (en) * | 2018-01-10 | 2018-07-06 | 南开大学 | Mobile robot scene recognition method based on ESN neural networks |
CN109242899A (en) * | 2018-09-03 | 2019-01-18 | 北京维盛泰科科技有限公司 | A kind of real-time positioning and map constructing method based on online visual dictionary |
CN110334763A (en) * | 2019-07-04 | 2019-10-15 | 北京字节跳动网络技术有限公司 | Model data file generation, image-recognizing method, device, equipment and medium |
CN110569913A (en) * | 2019-09-11 | 2019-12-13 | 北京云迹科技有限公司 | Scene classifier training method and device, scene recognition method and robot |
CN112905798A (en) * | 2021-03-26 | 2021-06-04 | 深圳市阿丹能量信息技术有限公司 | Indoor visual positioning method based on character identification |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102622607A (en) * | 2012-02-24 | 2012-08-01 | 河海大学 | Remote sensing image classification method based on multi-feature fusion |
CN103413142A (en) * | 2013-07-22 | 2013-11-27 | 中国科学院遥感与数字地球研究所 | Remote sensing image land utilization scene classification method based on two-dimension wavelet decomposition and visual sense bag-of-word model |
KR20140006566A (en) * | 2012-07-06 | 2014-01-16 | 한국과학기술원 | Method and apparatus for extraction of video signature using inclined video tomography |
CN103559191A (en) * | 2013-09-10 | 2014-02-05 | 浙江大学 | Cross-media sorting method based on hidden space learning and two-way sorting learning |
CN104239897A (en) * | 2014-09-04 | 2014-12-24 | 天津大学 | Visual feature representing method based on autoencoder word bag |
CN104915673A (en) * | 2014-03-11 | 2015-09-16 | 株式会社理光 | Object classification method and system based on bag of visual word model |
CN105843223A (en) * | 2016-03-23 | 2016-08-10 | 东南大学 | Mobile robot three-dimensional mapping and obstacle avoidance method based on space bag of words model |
-
2016
- 2016-11-01 CN CN201610933785.7A patent/CN106529583A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102622607A (en) * | 2012-02-24 | 2012-08-01 | 河海大学 | Remote sensing image classification method based on multi-feature fusion |
KR20140006566A (en) * | 2012-07-06 | 2014-01-16 | 한국과학기술원 | Method and apparatus for extraction of video signature using inclined video tomography |
CN103413142A (en) * | 2013-07-22 | 2013-11-27 | 中国科学院遥感与数字地球研究所 | Remote sensing image land utilization scene classification method based on two-dimension wavelet decomposition and visual sense bag-of-word model |
CN103559191A (en) * | 2013-09-10 | 2014-02-05 | 浙江大学 | Cross-media sorting method based on hidden space learning and two-way sorting learning |
CN104915673A (en) * | 2014-03-11 | 2015-09-16 | 株式会社理光 | Object classification method and system based on bag of visual word model |
CN104239897A (en) * | 2014-09-04 | 2014-12-24 | 天津大学 | Visual feature representing method based on autoencoder word bag |
CN105843223A (en) * | 2016-03-23 | 2016-08-10 | 东南大学 | Mobile robot three-dimensional mapping and obstacle avoidance method based on space bag of words model |
Non-Patent Citations (2)
Title |
---|
曹宁等: ""基于视觉词袋模型的图像分类改进方法"", 《电子设计工程》 * |
许宏科等: ""基于改进ORB的图像特征点匹配"", 《科学技术与工程》 * |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107220932A (en) * | 2017-04-18 | 2017-09-29 | 天津大学 | Panorama Mosaic method based on bag of words |
CN107220932B (en) * | 2017-04-18 | 2020-03-20 | 天津大学 | Panoramic image splicing method based on bag-of-words model |
CN107167144A (en) * | 2017-07-07 | 2017-09-15 | 武汉科技大学 | A kind of mobile robot indoor environment recognition positioning method of view-based access control model |
CN108256463A (en) * | 2018-01-10 | 2018-07-06 | 南开大学 | Mobile robot scene recognition method based on ESN neural networks |
CN108256463B (en) * | 2018-01-10 | 2022-01-04 | 南开大学 | Mobile robot scene recognition method based on ESN neural network |
CN109242899A (en) * | 2018-09-03 | 2019-01-18 | 北京维盛泰科科技有限公司 | A kind of real-time positioning and map constructing method based on online visual dictionary |
CN109242899B (en) * | 2018-09-03 | 2022-04-19 | 北京维盛泰科科技有限公司 | Real-time positioning and map building method based on online visual dictionary |
CN110334763A (en) * | 2019-07-04 | 2019-10-15 | 北京字节跳动网络技术有限公司 | Model data file generation, image-recognizing method, device, equipment and medium |
CN110569913A (en) * | 2019-09-11 | 2019-12-13 | 北京云迹科技有限公司 | Scene classifier training method and device, scene recognition method and robot |
CN112905798A (en) * | 2021-03-26 | 2021-06-04 | 深圳市阿丹能量信息技术有限公司 | Indoor visual positioning method based on character identification |
CN112905798B (en) * | 2021-03-26 | 2023-03-10 | 深圳市阿丹能量信息技术有限公司 | Indoor visual positioning method based on character identification |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106529583A (en) | Bag-of-visual-word-model-based indoor scene cognitive method | |
US10929649B2 (en) | Multi-pose face feature point detection method based on cascade regression | |
Zhang et al. | Chinese sign language recognition with adaptive HMM | |
CN105046195B (en) | Human bodys' response method based on asymmetric generalized gaussian model | |
Kawewong et al. | Online and incremental appearance-based SLAM in highly dynamic environments | |
Duan et al. | Detecting small objects using a channel-aware deconvolutional network | |
CN107832672A (en) | A kind of pedestrian's recognition methods again that more loss functions are designed using attitude information | |
CN107122375A (en) | The recognition methods of image subject based on characteristics of image | |
CN106407958B (en) | Face feature detection method based on double-layer cascade | |
CN103279768B (en) | A kind of video face identification method based on incremental learning face piecemeal visual characteristic | |
CN110555387B (en) | Behavior identification method based on space-time volume of local joint point track in skeleton sequence | |
CN109993102A (en) | Similar face retrieval method, apparatus and storage medium | |
Ren et al. | Facial expression recognition based on AAM–SIFT and adaptive regional weighting | |
Wang et al. | Head pose estimation with combined 2D SIFT and 3D HOG features | |
CN113963032A (en) | Twin network structure target tracking method fusing target re-identification | |
CN106096560A (en) | A kind of face alignment method | |
CN105868706A (en) | Method for identifying 3D model based on sparse coding | |
CN104966052A (en) | Attributive characteristic representation-based group behavior identification method | |
CN113808166B (en) | Single-target tracking method based on clustering difference and depth twin convolutional neural network | |
CN104462818B (en) | A kind of insertion manifold regression model based on Fisher criterions | |
CN111881716A (en) | Pedestrian re-identification method based on multi-view-angle generation countermeasure network | |
CN110516533A (en) | A kind of pedestrian based on depth measure discrimination method again | |
Qin et al. | A new improved convolutional neural network flower image recognition model | |
Zhang et al. | Part-Aware Correlation Networks for Few-shot Learning | |
Asif et al. | Simultaneous dense scene reconstruction and object labeling |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170322 |
|
RJ01 | Rejection of invention patent application after publication |