CN106529583A

CN106529583A - Bag-of-visual-word-model-based indoor scene cognitive method

Info

Publication number: CN106529583A
Application number: CN201610933785.7A
Authority: CN
Inventors: 赵玉新; 李亚宾; 刘厂; 雷宇宁
Original assignee: Harbin Engineering University
Current assignee: Harbin Engineering University
Priority date: 2016-11-01
Filing date: 2016-11-01
Publication date: 2017-03-22

Abstract

The invention, which belongs to the mobile robot environment sensing field, especially relates to a bag-of-visual-word-model-based indoor scene cognitive method. The method comprises an off-line part and an on-line part. At the off-line part, scene types are determined based on an application need; a robot uses a carried RGB-D sensor to scan all scenes to obtaining enough scene images to form an image training set; and an ORB 256-dimensional descriptor of each image in the image training set is generated by using an ORB algorithm, wherein each image includes thousands of ORB vectors usually. At the on-line part, the robot receives a current scene type inquiring instruction; and the system is initialized and is prepared for scene query. With the ORB algorithm, the image pretreatment process including feature extraction and matching is completed, so that the algorithm rapidness can be guaranteed; and the scene identification rate is improved by using a KNN classifier algorithm, so that the demand of common indoor scene inquire application of the mobile robot can be satisfied.

Description

A kind of indoor scene cognitive approach of view-based access control model bag of words

Technical field

The invention belongs to mobile robot environment sensing field, more particularly to a kind of indoor field of view-based access control model bag of words Scape cognitive approach.

Background technology

Under normal circumstances, grating map can meet robot to navigation, the bottom demand of avoidance task, but for completing The such as high-rise task of one class of man-machine interaction and mission planning, in addition it is also necessary to obtain with regard to the cognitive semantic information of scene, create face To cognitive semantic map.Mobile robot is moved in scene indoors, is unaware of itself position and is belonged to parlor, kitchen suppression Or bedroom, then can not complete similar to taking high intelligent task as bottle mineral water in the refrigerator for the mankind to kitchen.

The content of the invention

It is an object of the invention to propose a kind of indoor scene cognitive approach of view-based access control model bag of words.

The object of the present invention is achieved like this：

The present invention includes offline and online two parts, comprises the following steps that：

Offline part：

(1) determine scene type according to application demand, robot using RGB-D sensor scans each scene carried, Obtain enough scene image composition training set of images；

(2) the dimension descriptors of ORB 256 in training set of images per piece image are generated using ORB algorithms, each image is led to Hundreds and thousands of ORB vectors are included often；

(3) ORB characteristic points in training set of images are trained using K-means clustering algorithms, generate K class heart composition Visual vocabulary, constructs visual dictionary；

(4) for the ORB features of all images, frequency and frequency inverse that each vision word occurs is calculated, by TF- IDF adds weight to frequency table, generates the vision bag of words of each image of training set of weighting；Preserve visual dictionary and training set Vision bag of words just obtain the offline semantic map of new model；

Online part：

(5) robot receives the instruction of current scene Query, and system initialization is ready for scenario queries；

(6) robot obtains the RGB image of current scene using the video camera of its carrying, and is detected simultaneously using ORB algorithms Extract feature point set；

(7) query semantics map data base, compares visual dictionary, generates the weighting vision bag of words mould of current scene image Type；

(8) the vision bag of words of current scene image are regarded with semantic map data base training set using KNN graders Feel bag of words contrast, it is final to determine current scene classification, and return Query Result.

Described step (3) includes following sub-step：

(3.1) in feature point set X k sample point of random choose as initial cluster center

(3.2) calculate each characteristic point x in feature point set_i(i=1,2 ..., n) to the distance of all cluster centresAnd by characteristic point x_iIt is divided into class m closest with which_jIn；

(3.3) calculate the cluster centre of each classJ=1, wherein 2 ..., k, n_jFor being divided into class cluster m_j Middle feature is counted out, calculating target function W_n(t), and it is poor with a front result of calculation, if W_n(t)-W_n(t-1) ＜ 0, after Continuous iterative step (3.2), (3.3)；Otherwise, iteration is exited, calculating terminates；Using the k cluster centre for obtaining as vision word, All vision word lists storage is obtained into visual dictionary；

Step (3) visual dictionary word capacity parameter K is set to 900.

In step (8), KNN classifier parameters K are set to 1.

The beneficial effects of the present invention is：

The present invention completes the Image semantic classification process of feature extracting and matching using ORB algorithms, and algorithm rapidity is protected Card；Scene Recognition rate is improve using KNN classifier algorithms, common scenario queries application need in mobile robot room can be met Ask.

Description of the drawings

Indoor scene cognitive approach algorithm flow schematic diagrams of the Fig. 1 for view-based access control model bag of words.

Specific embodiment

Below in conjunction with the accompanying drawings the present invention is described further.

The invention discloses a kind of indoor scene cognitive approach of view-based access control model bag of words, the inventive method includes offline Map is generated and Online Map inquires about two parts.Offline map generation portion includes：Scanning scene obtains scene training set；ORB Feature detection and description；K mean cluster extracts class heart construction visual dictionary；TF-IDF technologies addition weight generates training set vision Bag of words data base.Online Map query portion includes：Receive scene query statement；Obtain current scene RGB image and carry Take ORB features；Inquiry map data base visual dictionary, generates current scene image vision bag of words；KNN graders are contrastively Chart database training set and current scene bag of words, judge current scene classification.By the way, the present invention can be quick Mobile robot is helped to complete indoor scene cognition exactly, so as to preferably same human interaction.

For solving the above problems, the present invention proposes the indoor scene cognitive approach of view-based access control model bag of words, so as to set up Indoor common scene visual dictionary, sets up a kind of new semantic map view cognitive towards indoor scene, is subsequently used for machine People's indoor scene Query.

For reaching above-mentioned purpose, technical scheme includes following main points：

Offline part：

Step 1. scanning scene obtains scene training set；

Step 2.ORB feature detection and description；

Step 3.K mean cluster extracts class heart construction visual dictionary；

Step 4.TF-IDF technology addition weight generates training set vision bag of words data base；

Online part：

Step 1. obtains current scene RGB image and extracts ORB features；

Step 2. inquiry map data base visual dictionary generates current scene image vision bag of words；

Step 3.KNN grader contrasts map data base training set and current scene bag of words, judges current scene class Not.

The indoor scene cognitive approach algorithm flow of view-based access control model bag of words is as shown in figure 1, can be divided into offline and online Two parts, specific implementation step are as follows：

(1) offline map is generated：

Step 1. determines scene type according to application demand, robot using the RGB-D sensor scans for carrying each Scape, obtains enough scene image composition training set of images.

Step 2. generates the dimension descriptors of ORB 256 in training set of images per piece image, each image using ORB algorithms Generally comprise hundreds and thousands of ORB vectors.

Step 3. is trained to ORB characteristic points in training set of images using K-means clustering algorithms, generates the K class heart Composition visual vocabulary, constructs visual dictionary.For indoor 10 or so scene, the scene that K=900 can obtain about 80% is taken Know quasi- rate, and algorithm possesses good rapidity, so parameter K of the present invention chooses 900.

K-means algorithms are a kind of unsupervised self-adaption cluster parsers, with efficiency high, are adapted at large-scale data The advantage of reason.Its core concept is in feature point set X={ x₁,x₂,…,x_nIn obtain k cluster centre { m₁,m₂,…,m_k, The characteristic point in set of characteristic points is met to square distance and the minimum of the affiliated class heart, its object function expression formula is：

Step 3 specifically includes following sub-step：

Step 3.1. in feature point set X k sample point of random choose as initial cluster center

Step 3.2. calculates each characteristic point x in feature point set_i(i=1,2 ..., n) to the distance of all cluster centresAnd by characteristic point x_iIt is divided into class m closest with which_jIn；

Step 3.3. calculates the cluster centre of each classJ=1, wherein 2 ..., k, n_jFor being divided into class cluster m_jMiddle feature is counted out, according to formula (1) calculating target function W_n(t), and it is poor with a front result of calculation, if W_n(t)-W_n(t- 1) ＜ 0, continues iterative step 3.2,3.3；Otherwise, iteration is exited, calculating terminates.Using the k cluster centre for obtaining as vision All vision word lists storage is obtained visual dictionary by word.

ORB feature of the step 4. for all images, calculates frequency (TF) and frequency inverse that each vision word occurs (IDF) weight is added to frequency table by TF-IDF, the vision bag of words of each image of training set of weighting are generated.Preservation is regarded Feel that dictionary and training set vision bag of words just obtain the offline semantic map of new model.

After visual dictionary is obtained, so that it may which the vision word frequency histogram for obtaining image using visual dictionary Jing statistics is retouched State.For each width training image and test image, the numerous low-level image features for extracting acquisition are entered with the word in visual dictionary Row matching, finds immediate one and replaces description, finally count the number of times that each word occurs, and just obtains image based on frequency The histogrammic vision bag of words of number are represented.

Hypothesis visual dictionary is { m₁,m₂,…,m_k, ORB low-level image features and each vision list are calculated using nearest neighbor algorithm Euclidean distance between word, so as to by feature v_iReplace description with his nearest vision word, as shown in formula (2).

(2) Online Map inquiry：

Step 1. robot receives the instruction of current scene Query, and system initialization is ready for scenario queries.

Step 2. robot obtains the RGB image of current scene using the video camera of its carrying, and is detected using ORB algorithms And extract feature point set.

Step 3. query semantics map data base, compares visual dictionary, generates the weighting vision bag of words of current scene image Model.

Step 4. adopts KNN graders by the vision bag of words of current scene image and semantic map data base training set Vision bag of words are contrasted, final to determine current scene classification, and return Query Result.

The basic thought of KNN algorithms can be expressed as：Calculate current scene vision bag of words undetermined and each vision of training set The similarity of bag of words, finds out each samples of most like K, determines current scene vision according to the category vote result of this K sample Classification.Here similarity measurement adopts Euclidean distance, two n-dimensional vector a=(x₁₁,x₁₂,…,x_1n) and b=(x₂₁, x₂₂,…,x_2n) Euclidean distance be：

Expressed with the form of vector operation, then：

Jing is tested, and KNN parameters K are elected 1 or 3 as and there is higher scene to know quasi- rate, and KNN parameters K of the present invention select 1.

Claims

1. a kind of indoor scene cognitive approach of view-based access control model bag of words, it is characterised in that including offline and online two portions Point, comprise the following steps that：

Offline part：

(1) scene type is determined according to application demand, robot is obtained using each scene of the RGB-D sensor scans of carrying Enough scene image composition training set of images；

(2) the dimension descriptors of ORB 256 in training set of images per piece image are generated using ORB algorithms, each image is generally wrapped Containing hundreds and thousands of ORB vectors；

(3) ORB characteristic points in training set of images are trained using K-means clustering algorithms, generate the K class heart and constitute vision Vocabulary, constructs visual dictionary；

(4) for the ORB features of all images, frequency and frequency inverse that each vision word occurs is calculated, by TF-IDF Weight is added to frequency table, the vision bag of words of each image of training set of weighting are generated；Preserve visual dictionary and training set is regarded Feel that bag of words just obtain the offline semantic map of new model；

Online part：

(6) robot obtains the RGB image of current scene using the video camera of its carrying, and is detected and extracted using ORB algorithms Feature point set；

(7) query semantics map data base, compares visual dictionary, generates the weighting vision bag of words of current scene image；

(8) KNN graders are adopted by the vision bag of words of current scene image and semantic map data base training set visual word Bag model is contrasted, final to determine current scene classification, and returns Query Result.

2. the indoor scene cognitive approach of a kind of view-based access control model bag of words according to claim 1, it is characterised in that：Institute The step of stating (3) includes following sub-step：

(3.3) calculate the cluster centre of each classWherein n_jFor being divided into class cluster m_jMiddle feature Count out, calculating target function W_n(t), and it is poor with a front result of calculation, if W_n(t)-W_n(t-1) ＜ 0, continues iteration Step (3.2), (3.3)；Otherwise, iteration is exited, calculating terminates；Using the k cluster centre for obtaining as vision word, will be all Vision word list storage obtains visual dictionary；

W_{n} = Σ_{i = 1}^{n} \min_{1 \leq j \leq k} | x_{i} - m_{j} |^{2}

3. the indoor scene cognitive approach of a kind of view-based access control model bag of words according to claim 1, it is characterised in that：Institute State step (3) visual dictionary word capacity parameter K and be set to 900.

4. the indoor scene cognitive approach of a kind of view-based access control model bag of words according to claim 1, it is characterised in that：Institute In stating step (8), KNN classifier parameters K are set to 1.