CN106529583A - Bag-of-visual-word-model-based indoor scene cognitive method - Google Patents

Bag-of-visual-word-model-based indoor scene cognitive method Download PDF

Info

Publication number
CN106529583A
CN106529583A CN201610933785.7A CN201610933785A CN106529583A CN 106529583 A CN106529583 A CN 106529583A CN 201610933785 A CN201610933785 A CN 201610933785A CN 106529583 A CN106529583 A CN 106529583A
Authority
CN
China
Prior art keywords
scene
bag
orb
image
words
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610933785.7A
Other languages
Chinese (zh)
Inventor
赵玉新
李亚宾
刘厂
雷宇宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Engineering University
Original Assignee
Harbin Engineering University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Engineering University filed Critical Harbin Engineering University
Priority to CN201610933785.7A priority Critical patent/CN106529583A/en
Publication of CN106529583A publication Critical patent/CN106529583A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

The invention, which belongs to the mobile robot environment sensing field, especially relates to a bag-of-visual-word-model-based indoor scene cognitive method. The method comprises an off-line part and an on-line part. At the off-line part, scene types are determined based on an application need; a robot uses a carried RGB-D sensor to scan all scenes to obtaining enough scene images to form an image training set; and an ORB 256-dimensional descriptor of each image in the image training set is generated by using an ORB algorithm, wherein each image includes thousands of ORB vectors usually. At the on-line part, the robot receives a current scene type inquiring instruction; and the system is initialized and is prepared for scene query. With the ORB algorithm, the image pretreatment process including feature extraction and matching is completed, so that the algorithm rapidness can be guaranteed; and the scene identification rate is improved by using a KNN classifier algorithm, so that the demand of common indoor scene inquire application of the mobile robot can be satisfied.

Description

A kind of indoor scene cognitive approach of view-based access control model bag of words
Technical field
The invention belongs to mobile robot environment sensing field, more particularly to a kind of indoor field of view-based access control model bag of words Scape cognitive approach.
Background technology
Under normal circumstances, grating map can meet robot to navigation, the bottom demand of avoidance task, but for completing The such as high-rise task of one class of man-machine interaction and mission planning, in addition it is also necessary to obtain with regard to the cognitive semantic information of scene, create face To cognitive semantic map.Mobile robot is moved in scene indoors, is unaware of itself position and is belonged to parlor, kitchen suppression Or bedroom, then can not complete similar to taking high intelligent task as bottle mineral water in the refrigerator for the mankind to kitchen.
The content of the invention
It is an object of the invention to propose a kind of indoor scene cognitive approach of view-based access control model bag of words.
The object of the present invention is achieved like this:
The present invention includes offline and online two parts, comprises the following steps that:
Offline part:
(1) determine scene type according to application demand, robot using RGB-D sensor scans each scene carried, Obtain enough scene image composition training set of images;
(2) the dimension descriptors of ORB 256 in training set of images per piece image are generated using ORB algorithms, each image is led to Hundreds and thousands of ORB vectors are included often;
(3) ORB characteristic points in training set of images are trained using K-means clustering algorithms, generate K class heart composition Visual vocabulary, constructs visual dictionary;
(4) for the ORB features of all images, frequency and frequency inverse that each vision word occurs is calculated, by TF- IDF adds weight to frequency table, generates the vision bag of words of each image of training set of weighting;Preserve visual dictionary and training set Vision bag of words just obtain the offline semantic map of new model;
Online part:
(5) robot receives the instruction of current scene Query, and system initialization is ready for scenario queries;
(6) robot obtains the RGB image of current scene using the video camera of its carrying, and is detected simultaneously using ORB algorithms Extract feature point set;
(7) query semantics map data base, compares visual dictionary, generates the weighting vision bag of words mould of current scene image Type;
(8) the vision bag of words of current scene image are regarded with semantic map data base training set using KNN graders Feel bag of words contrast, it is final to determine current scene classification, and return Query Result.
Described step (3) includes following sub-step:
(3.1) in feature point set X k sample point of random choose as initial cluster center
(3.2) calculate each characteristic point x in feature point seti(i=1,2 ..., n) to the distance of all cluster centresAnd by characteristic point xiIt is divided into class m closest with whichjIn;
(3.3) calculate the cluster centre of each classJ=1, wherein 2 ..., k, njFor being divided into class cluster mj Middle feature is counted out, calculating target function Wn(t), and it is poor with a front result of calculation, if Wn(t)-Wn(t-1) < 0, after Continuous iterative step (3.2), (3.3);Otherwise, iteration is exited, calculating terminates;Using the k cluster centre for obtaining as vision word, All vision word lists storage is obtained into visual dictionary;
Step (3) visual dictionary word capacity parameter K is set to 900.
In step (8), KNN classifier parameters K are set to 1.
The beneficial effects of the present invention is:
The present invention completes the Image semantic classification process of feature extracting and matching using ORB algorithms, and algorithm rapidity is protected Card;Scene Recognition rate is improve using KNN classifier algorithms, common scenario queries application need in mobile robot room can be met Ask.
Description of the drawings
Indoor scene cognitive approach algorithm flow schematic diagrams of the Fig. 1 for view-based access control model bag of words.
Specific embodiment
Below in conjunction with the accompanying drawings the present invention is described further.
The invention discloses a kind of indoor scene cognitive approach of view-based access control model bag of words, the inventive method includes offline Map is generated and Online Map inquires about two parts.Offline map generation portion includes:Scanning scene obtains scene training set;ORB Feature detection and description;K mean cluster extracts class heart construction visual dictionary;TF-IDF technologies addition weight generates training set vision Bag of words data base.Online Map query portion includes:Receive scene query statement;Obtain current scene RGB image and carry Take ORB features;Inquiry map data base visual dictionary, generates current scene image vision bag of words;KNN graders are contrastively Chart database training set and current scene bag of words, judge current scene classification.By the way, the present invention can be quick Mobile robot is helped to complete indoor scene cognition exactly, so as to preferably same human interaction.
For solving the above problems, the present invention proposes the indoor scene cognitive approach of view-based access control model bag of words, so as to set up Indoor common scene visual dictionary, sets up a kind of new semantic map view cognitive towards indoor scene, is subsequently used for machine People's indoor scene Query.
For reaching above-mentioned purpose, technical scheme includes following main points:
Offline part:
Step 1. scanning scene obtains scene training set;
Step 2.ORB feature detection and description;
Step 3.K mean cluster extracts class heart construction visual dictionary;
Step 4.TF-IDF technology addition weight generates training set vision bag of words data base;
Online part:
Step 1. obtains current scene RGB image and extracts ORB features;
Step 2. inquiry map data base visual dictionary generates current scene image vision bag of words;
Step 3.KNN grader contrasts map data base training set and current scene bag of words, judges current scene class Not.
The indoor scene cognitive approach algorithm flow of view-based access control model bag of words is as shown in figure 1, can be divided into offline and online Two parts, specific implementation step are as follows:
(1) offline map is generated:
Step 1. determines scene type according to application demand, robot using the RGB-D sensor scans for carrying each Scape, obtains enough scene image composition training set of images.
Step 2. generates the dimension descriptors of ORB 256 in training set of images per piece image, each image using ORB algorithms Generally comprise hundreds and thousands of ORB vectors.
Step 3. is trained to ORB characteristic points in training set of images using K-means clustering algorithms, generates the K class heart Composition visual vocabulary, constructs visual dictionary.For indoor 10 or so scene, the scene that K=900 can obtain about 80% is taken Know quasi- rate, and algorithm possesses good rapidity, so parameter K of the present invention chooses 900.
K-means algorithms are a kind of unsupervised self-adaption cluster parsers, with efficiency high, are adapted at large-scale data The advantage of reason.Its core concept is in feature point set X={ x1,x2,…,xnIn obtain k cluster centre { m1,m2,…,mk, The characteristic point in set of characteristic points is met to square distance and the minimum of the affiliated class heart, its object function expression formula is:
Step 3 specifically includes following sub-step:
Step 3.1. in feature point set X k sample point of random choose as initial cluster center
Step 3.2. calculates each characteristic point x in feature point seti(i=1,2 ..., n) to the distance of all cluster centresAnd by characteristic point xiIt is divided into class m closest with whichjIn;
Step 3.3. calculates the cluster centre of each classJ=1, wherein 2 ..., k, njFor being divided into class cluster mjMiddle feature is counted out, according to formula (1) calculating target function Wn(t), and it is poor with a front result of calculation, if Wn(t)-Wn(t- 1) < 0, continues iterative step 3.2,3.3;Otherwise, iteration is exited, calculating terminates.Using the k cluster centre for obtaining as vision All vision word lists storage is obtained visual dictionary by word.
ORB feature of the step 4. for all images, calculates frequency (TF) and frequency inverse that each vision word occurs (IDF) weight is added to frequency table by TF-IDF, the vision bag of words of each image of training set of weighting are generated.Preservation is regarded Feel that dictionary and training set vision bag of words just obtain the offline semantic map of new model.
After visual dictionary is obtained, so that it may which the vision word frequency histogram for obtaining image using visual dictionary Jing statistics is retouched State.For each width training image and test image, the numerous low-level image features for extracting acquisition are entered with the word in visual dictionary Row matching, finds immediate one and replaces description, finally count the number of times that each word occurs, and just obtains image based on frequency The histogrammic vision bag of words of number are represented.
Hypothesis visual dictionary is { m1,m2,…,mk, ORB low-level image features and each vision list are calculated using nearest neighbor algorithm Euclidean distance between word, so as to by feature viReplace description with his nearest vision word, as shown in formula (2).
(2) Online Map inquiry:
Step 1. robot receives the instruction of current scene Query, and system initialization is ready for scenario queries.
Step 2. robot obtains the RGB image of current scene using the video camera of its carrying, and is detected using ORB algorithms And extract feature point set.
Step 3. query semantics map data base, compares visual dictionary, generates the weighting vision bag of words of current scene image Model.
Step 4. adopts KNN graders by the vision bag of words of current scene image and semantic map data base training set Vision bag of words are contrasted, final to determine current scene classification, and return Query Result.
The basic thought of KNN algorithms can be expressed as:Calculate current scene vision bag of words undetermined and each vision of training set The similarity of bag of words, finds out each samples of most like K, determines current scene vision according to the category vote result of this K sample Classification.Here similarity measurement adopts Euclidean distance, two n-dimensional vector a=(x11,x12,…,x1n) and b=(x21, x22,…,x2n) Euclidean distance be:
Expressed with the form of vector operation, then:
Jing is tested, and KNN parameters K are elected 1 or 3 as and there is higher scene to know quasi- rate, and KNN parameters K of the present invention select 1.

Claims (4)

1. a kind of indoor scene cognitive approach of view-based access control model bag of words, it is characterised in that including offline and online two portions Point, comprise the following steps that:
Offline part:
(1) scene type is determined according to application demand, robot is obtained using each scene of the RGB-D sensor scans of carrying Enough scene image composition training set of images;
(2) the dimension descriptors of ORB 256 in training set of images per piece image are generated using ORB algorithms, each image is generally wrapped Containing hundreds and thousands of ORB vectors;
(3) ORB characteristic points in training set of images are trained using K-means clustering algorithms, generate the K class heart and constitute vision Vocabulary, constructs visual dictionary;
(4) for the ORB features of all images, frequency and frequency inverse that each vision word occurs is calculated, by TF-IDF Weight is added to frequency table, the vision bag of words of each image of training set of weighting are generated;Preserve visual dictionary and training set is regarded Feel that bag of words just obtain the offline semantic map of new model;
Online part:
(5) robot receives the instruction of current scene Query, and system initialization is ready for scenario queries;
(6) robot obtains the RGB image of current scene using the video camera of its carrying, and is detected and extracted using ORB algorithms Feature point set;
(7) query semantics map data base, compares visual dictionary, generates the weighting vision bag of words of current scene image;
(8) KNN graders are adopted by the vision bag of words of current scene image and semantic map data base training set visual word Bag model is contrasted, final to determine current scene classification, and returns Query Result.
2. the indoor scene cognitive approach of a kind of view-based access control model bag of words according to claim 1, it is characterised in that:Institute The step of stating (3) includes following sub-step:
(3.1) in feature point set X k sample point of random choose as initial cluster center
(3.2) calculate each characteristic point x in feature point seti(i=1,2 ..., n) to the distance of all cluster centresAnd by characteristic point xiIt is divided into class m closest with whichjIn;
(3.3) calculate the cluster centre of each classWherein njFor being divided into class cluster mjMiddle feature Count out, calculating target function Wn(t), and it is poor with a front result of calculation, if Wn(t)-Wn(t-1) < 0, continues iteration Step (3.2), (3.3);Otherwise, iteration is exited, calculating terminates;Using the k cluster centre for obtaining as vision word, will be all Vision word list storage obtains visual dictionary;
W n = Σ i = 1 n min 1 ≤ j ≤ k | x i - m j | 2
3. the indoor scene cognitive approach of a kind of view-based access control model bag of words according to claim 1, it is characterised in that:Institute State step (3) visual dictionary word capacity parameter K and be set to 900.
4. the indoor scene cognitive approach of a kind of view-based access control model bag of words according to claim 1, it is characterised in that:Institute In stating step (8), KNN classifier parameters K are set to 1.
CN201610933785.7A 2016-11-01 2016-11-01 Bag-of-visual-word-model-based indoor scene cognitive method Pending CN106529583A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610933785.7A CN106529583A (en) 2016-11-01 2016-11-01 Bag-of-visual-word-model-based indoor scene cognitive method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610933785.7A CN106529583A (en) 2016-11-01 2016-11-01 Bag-of-visual-word-model-based indoor scene cognitive method

Publications (1)

Publication Number Publication Date
CN106529583A true CN106529583A (en) 2017-03-22

Family

ID=58291890

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610933785.7A Pending CN106529583A (en) 2016-11-01 2016-11-01 Bag-of-visual-word-model-based indoor scene cognitive method

Country Status (1)

Country Link
CN (1) CN106529583A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107167144A (en) * 2017-07-07 2017-09-15 武汉科技大学 A kind of mobile robot indoor environment recognition positioning method of view-based access control model
CN107220932A (en) * 2017-04-18 2017-09-29 天津大学 Panorama Mosaic method based on bag of words
CN108256463A (en) * 2018-01-10 2018-07-06 南开大学 Mobile robot scene recognition method based on ESN neural networks
CN109242899A (en) * 2018-09-03 2019-01-18 北京维盛泰科科技有限公司 A kind of real-time positioning and map constructing method based on online visual dictionary
CN110334763A (en) * 2019-07-04 2019-10-15 北京字节跳动网络技术有限公司 Model data file generation, image-recognizing method, device, equipment and medium
CN110569913A (en) * 2019-09-11 2019-12-13 北京云迹科技有限公司 Scene classifier training method and device, scene recognition method and robot
CN112905798A (en) * 2021-03-26 2021-06-04 深圳市阿丹能量信息技术有限公司 Indoor visual positioning method based on character identification

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102622607A (en) * 2012-02-24 2012-08-01 河海大学 Remote sensing image classification method based on multi-feature fusion
CN103413142A (en) * 2013-07-22 2013-11-27 中国科学院遥感与数字地球研究所 Remote sensing image land utilization scene classification method based on two-dimension wavelet decomposition and visual sense bag-of-word model
KR20140006566A (en) * 2012-07-06 2014-01-16 한국과학기술원 Method and apparatus for extraction of video signature using inclined video tomography
CN103559191A (en) * 2013-09-10 2014-02-05 浙江大学 Cross-media sorting method based on hidden space learning and two-way sorting learning
CN104239897A (en) * 2014-09-04 2014-12-24 天津大学 Visual feature representing method based on autoencoder word bag
CN104915673A (en) * 2014-03-11 2015-09-16 株式会社理光 Object classification method and system based on bag of visual word model
CN105843223A (en) * 2016-03-23 2016-08-10 东南大学 Mobile robot three-dimensional mapping and obstacle avoidance method based on space bag of words model

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102622607A (en) * 2012-02-24 2012-08-01 河海大学 Remote sensing image classification method based on multi-feature fusion
KR20140006566A (en) * 2012-07-06 2014-01-16 한국과학기술원 Method and apparatus for extraction of video signature using inclined video tomography
CN103413142A (en) * 2013-07-22 2013-11-27 中国科学院遥感与数字地球研究所 Remote sensing image land utilization scene classification method based on two-dimension wavelet decomposition and visual sense bag-of-word model
CN103559191A (en) * 2013-09-10 2014-02-05 浙江大学 Cross-media sorting method based on hidden space learning and two-way sorting learning
CN104915673A (en) * 2014-03-11 2015-09-16 株式会社理光 Object classification method and system based on bag of visual word model
CN104239897A (en) * 2014-09-04 2014-12-24 天津大学 Visual feature representing method based on autoencoder word bag
CN105843223A (en) * 2016-03-23 2016-08-10 东南大学 Mobile robot three-dimensional mapping and obstacle avoidance method based on space bag of words model

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
曹宁等: ""基于视觉词袋模型的图像分类改进方法"", 《电子设计工程》 *
许宏科等: ""基于改进ORB的图像特征点匹配"", 《科学技术与工程》 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107220932A (en) * 2017-04-18 2017-09-29 天津大学 Panorama Mosaic method based on bag of words
CN107220932B (en) * 2017-04-18 2020-03-20 天津大学 Panoramic image splicing method based on bag-of-words model
CN107167144A (en) * 2017-07-07 2017-09-15 武汉科技大学 A kind of mobile robot indoor environment recognition positioning method of view-based access control model
CN108256463A (en) * 2018-01-10 2018-07-06 南开大学 Mobile robot scene recognition method based on ESN neural networks
CN108256463B (en) * 2018-01-10 2022-01-04 南开大学 Mobile robot scene recognition method based on ESN neural network
CN109242899A (en) * 2018-09-03 2019-01-18 北京维盛泰科科技有限公司 A kind of real-time positioning and map constructing method based on online visual dictionary
CN109242899B (en) * 2018-09-03 2022-04-19 北京维盛泰科科技有限公司 Real-time positioning and map building method based on online visual dictionary
CN110334763A (en) * 2019-07-04 2019-10-15 北京字节跳动网络技术有限公司 Model data file generation, image-recognizing method, device, equipment and medium
CN110569913A (en) * 2019-09-11 2019-12-13 北京云迹科技有限公司 Scene classifier training method and device, scene recognition method and robot
CN112905798A (en) * 2021-03-26 2021-06-04 深圳市阿丹能量信息技术有限公司 Indoor visual positioning method based on character identification
CN112905798B (en) * 2021-03-26 2023-03-10 深圳市阿丹能量信息技术有限公司 Indoor visual positioning method based on character identification

Similar Documents

Publication Publication Date Title
CN106529583A (en) Bag-of-visual-word-model-based indoor scene cognitive method
US10929649B2 (en) Multi-pose face feature point detection method based on cascade regression
Zhang et al. Chinese sign language recognition with adaptive HMM
CN105046195B (en) Human bodys' response method based on asymmetric generalized gaussian model
Kawewong et al. Online and incremental appearance-based SLAM in highly dynamic environments
Duan et al. Detecting small objects using a channel-aware deconvolutional network
CN107832672A (en) A kind of pedestrian's recognition methods again that more loss functions are designed using attitude information
CN107122375A (en) The recognition methods of image subject based on characteristics of image
CN106407958B (en) Face feature detection method based on double-layer cascade
CN103279768B (en) A kind of video face identification method based on incremental learning face piecemeal visual characteristic
CN110555387B (en) Behavior identification method based on space-time volume of local joint point track in skeleton sequence
CN109993102A (en) Similar face retrieval method, apparatus and storage medium
Ren et al. Facial expression recognition based on AAM–SIFT and adaptive regional weighting
Wang et al. Head pose estimation with combined 2D SIFT and 3D HOG features
CN113963032A (en) Twin network structure target tracking method fusing target re-identification
CN106096560A (en) A kind of face alignment method
CN105868706A (en) Method for identifying 3D model based on sparse coding
CN104966052A (en) Attributive characteristic representation-based group behavior identification method
CN113808166B (en) Single-target tracking method based on clustering difference and depth twin convolutional neural network
CN104462818B (en) A kind of insertion manifold regression model based on Fisher criterions
CN111881716A (en) Pedestrian re-identification method based on multi-view-angle generation countermeasure network
CN110516533A (en) A kind of pedestrian based on depth measure discrimination method again
Qin et al. A new improved convolutional neural network flower image recognition model
Zhang et al. Part-Aware Correlation Networks for Few-shot Learning
Asif et al. Simultaneous dense scene reconstruction and object labeling

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20170322

RJ01 Rejection of invention patent application after publication