CN109284687A

CN109284687A - A kind of scene recognition method and device based on indoor unit meeting signal enhancing

Info

Publication number: CN109284687A
Application number: CN201810972177.6A
Authority: CN
Inventors: 呙维; 吴然; 陈艳华; 朱欣焰
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2018-08-24
Filing date: 2018-08-24
Publication date: 2019-01-29
Anticipated expiration: 2038-08-24
Also published as: CN109284687B

Abstract

The present invention provides a kind of scene recognition methods and device based on indoor unit meeting signal enhancing, by obtaining indoor positioning point scene rank image feature with transfer learning training convolutional neural networks, and the location feature of anchor point is described by the location information of anchor point and scene base map and position error, then expansion enhancing is carried out to image feature using the location feature, scene Recognition prediction is carried out using the method fusion evaluation feature and location feature of deep learning, to realize the technical effect for obtaining higher scene Recognition accuracy.

Description

A kind of scene recognition method and device based on indoor unit meeting signal enhancing

Technical field

The present invention relates to indoor scene identification technology fields, and in particular to a kind of scene based on indoor unit meeting signal enhancing Recognition methods and device.

Background technique

Scene Recognition problem belong to computer vision field one is rich in the project of challenge, applies in automatic Pilot, machine The every field such as device people.

Existing scene recognition method usually identified and classified in a manner of image characteristic analysis, such as at traditional images The classification and marking that research relevant with pattern-recognition carries out scene image is managed, but needs a large amount of manual operation and algorithm It is complex.There are also some to utilize deep learning method on large-scale dataset, however for indoor environment, scene Comprising complicated decoration and layout, so it is not high to still have recognition accuracy using deep learning solution indoor scene identification Problem.

From the foregoing, it will be observed that the technical problem that scene recognition method in the prior art is not high there are recognition accuracy.

Summary of the invention

The present invention by provide it is a kind of based on indoor unit can signal enhancing scene recognition method and device, solving or Person at least partly solves the scene recognition method in the prior art technical problem not high there are recognition accuracy.

First aspect present invention provides a kind of scene recognition method based on indoor unit meeting signal enhancing, comprising:

Step S1: for default application scenarios, typical scene image collection is acquired；

Step S2: it in conjunction with scene base map corresponding with scene to be studied, is adopted on the main roads of the scene to be studied Collect the location information and image information of mobile device, wherein the location information and image information are corresponding with anchor point；

Step S3: the scene image collection is inputted in default Fusion Features neural network, to the default Fusion Features The convolutional neural networks module of neural network carries out model fine tuning, and acquisition is migrated to the convolutional neural networks of the scene to be studied Model, wherein the default Fusion Features neural network includes image feature extraction module and Fusion Features decision-making module, described Image feature extraction module is realized by the convolutional neural networks module, converts image for image information corresponding with anchor point Feature vector；

Step S4: anchor point and the scene base map are overlapped, and are obtained anchor point and are in the position feature in scene Vector；

Step S5: will be centered on anchor point, the circle of uncertainty and the scene base map progress that default position error is radius Superposition, obtains the intersecting area of the circle of uncertainty Yu each scene, and according to the intersecting area, obtain the anchor point and each field The relationship characteristic vector of scape；

Step S6: by anchor point be in the relationship characteristic of position feature vector in scene and anchor point and each scene to Amount, is spliced into location feature vector, then after location feature vector image feature vector corresponding with anchor point is spliced, defeated Enter the Fusion Features decision-making module, the parameter of the training Fusion Features decision-making module；

Step S7: the image feature extraction module obtained after model fine tuning will be carried out and determined with the Fusion Features after step training The parameter of plan module is fixed, the Fusion Features neural network after being merged；

Step S8: the corresponding image information of anchor point and anchor point and the relationship characteristic vector of each scene are inputted into the conjunction Fusion Features neural network after and, obtains scene prediction feature vector, by probability highest item pair in the predicted characteristics vector The scene type answered is as scene Recognition result.

Further, step S3 is specifically included:

Step S3.1: the scene image collection is inputted in default Fusion Features neural network, to the convolutional Neural net Network module is trained, and updates the full connection layer parameter of the convolutional neural networks module, retains the parameter of convolutional layer, obtains mould Convolutional neural networks module after type fine tuning, as the image feature extraction module after model fine tuning；

Step S3.2: inputting convolutional neural networks module after the model fine tuning for image information corresponding with anchor point, Output tensor is obtained, as the image feature vector, and the image feature vector is corresponding with anchor point.

Further, step S4 is specifically included:

Step S4.1 initializes position feature vector, is assigned a value ofThe position feature vector Element digit is N_category+ 1, wherein N_categoryFor scene type number, it is in first term as anchor point outside all scenes Feature representation item；

Step S4.2 judges the relationship of anchor point Yu each scene, if anchor point is fallen into kth class scene, by position spy Kth+1 of sign vector is assigned a value of 1, specifically:

If anchor point is not in any scene, the first term of position feature vector is assigned a value of 1, specifically:

The position feature vector is saved, and position feature vector is corresponded to specific positioning by step S4.3 Point.

Further, step S5 is specifically included:

Step S5.1, the relationship characteristic vector of initialization and each scene, feature vector have N_categoryBit element, it is corresponding each The significance degree of class scene and anchor point relationship, wherein N_categoryFor scene type sum, it is assigned a value of

Step S5.2, by scene boundary and to position point center, default position error R_noiseFor radius the circle of uncertainty into Row superposition calculation intersecting area traverses each scene, and area value is cumulative by scene type, and is assigned to the correspondence of relationship characteristic vector Element, then feature vector is normalized, the relationship characteristic vector is obtained, shaped like:

{S_i/1N_categoryS_i}

Wherein, S_iGained intersecting area, N are superimposed with the circle of uncertainty for the scene of classification i_categoryFor scene type sum；

Step S5.3 saves anchor point and the relationship characteristic vector of each scene, and relationship characteristic vector is corresponding To specific anchor point.

Further, step S6 is specifically included:

Step S6.1: by the position feature vector V of each anchor point_locationWith each anchor point and each scene relationship characteristic Vector V_relationIt is spliced into location feature vector V_positioning, shaped like:

Step S6.2: the image feature vector and location feature vector splicing are configured to [1,3*N_category+1] Feature vector V_fuse, shaped like:

Step S6.3: by spliced feature vector V_fuseThe Fusion Features decision-making module is inputted, and by the feature The full articulamentum output shape of Fusion Features for merging decision-making module is [1,3*N_category+ 1] fusion feature vector, then will be described Fusion feature vector via full articulamentum input is finally predicted, trained by the parameter of the training Fusion Features decision-making module Fusion Features decision-making module afterwards.

Further, step S8 is specifically included:

The corresponding image information position feature vector of anchor point is inputted the image feature obtained after the model fine tuning to extract The relationship characteristic vector of the anchor point and each scene is inputted the Fusion Features decision-making module after the training, exported by module Anchor point belongs to the probability value of each scene, using the corresponding scene type of probability value highest item as the scene Recognition result.

Based on same inventive concept, second aspect of the present invention provides a kind of scene based on indoor unit meeting signal enhancing Identification device, comprising:

Scene image collection acquisition module, for acquiring typical scene image collection for default application scenarios；

Location information and image information acquisition module, for combining scene base map corresponding with scene to be studied, described The location information and image information of mobile device are acquired on the main roads of scene to be studied, wherein the location information and shadow As information is corresponding with anchor point；

Transfer learning module, for inputting the scene image collection in default Fusion Features neural network, to described pre- If the convolutional neural networks module of Fusion Features neural network carries out model fine tuning, acquisition is migrated to the volume of the scene to be studied Product neural network model, wherein the default Fusion Features neural network includes that image feature extraction module and Fusion Features are determined Plan module, the image feature extraction module are realized by the convolutional neural networks module, and image corresponding with anchor point is believed Breath is converted into image feature vector；

Position feature vector calculation module obtains at anchor point for anchor point and the scene base map to be overlapped Position feature vector in scene；

Relationship characteristic vector calculation module, for by centered on anchor point, default position error for radius the circle of uncertainty It is overlapped with the scene base map, obtains the intersecting area of the circle of uncertainty Yu each scene, and according to the intersecting area, obtained Obtain the relationship characteristic vector of the anchor point and each scene；

Fusion Features decision-making module training module, position feature vector and anchor point for being in anchor point in scene With the relationship characteristic vector of each scene, it is spliced into location feature vector, then the location feature vector is corresponding with anchor point After the splicing of image feature vector, the Fusion Features decision-making module, the parameter of the training Fusion Features decision-making module are inputted；

Merging module, for the image feature extraction module obtained after model fine tuning and the feature after step training will to be carried out The parameter of fusion decision-making module is fixed, the Fusion Features neural network after being merged；

Prediction module, for inputting the relationship characteristic vector of the corresponding image information of anchor point and anchor point and each scene Fusion Features neural network after the merging, obtains scene prediction feature vector, most by probability in the predicted characteristics vector The corresponding scene type of high item is as scene Recognition result.

Further, transfer learning module is specifically used for:

The scene image collection is inputted in default Fusion Features neural network, the convolutional neural networks module is carried out Training, updates the full connection layer parameter of the convolutional neural networks module, retains the parameter of convolutional layer, rolls up after obtaining model fine tuning Product neural network module, as the image feature extraction module after model fine tuning；

Image information corresponding with anchor point is inputted into convolutional neural networks module after the model fine tuning, obtains output Amount, as the image feature vector, and the image feature vector is corresponding with anchor point.

Based on same inventive concept, third aspect present invention provides a kind of computer readable storage medium, deposits thereon Computer program is contained, which, which is performed, realizes method described in first aspect.

Based on same inventive concept, fourth aspect present invention provides a kind of computer equipment, including memory, processing On a memory and the computer program that can run on a processor, when processor execution described program, is realized for device and storage Method described in first aspect.

Said one or multiple technical solutions in the embodiment of the present application at least have following one or more technology effects Fruit:

In method provided by the invention, default Fusion Features neural network is inputted by acquiring typical scene image collection In, model fine tuning is carried out to the convolutional neural networks module of default Fusion Features neural network, can obtain and migrate to be studied The convolutional neural networks model of scene, and will image information corresponding with anchor point by the convolutional neural networks after model fine tuning It is converted into image feature vector, that is to say, that it is special indoor positioning point scene rank image can be obtained with convolutional neural networks It levies, and describes the location feature of anchor point by the location information of anchor point and scene base map and position error, and is fixed using this Position feature carries out expansion enhancing to image feature, is carried out with location feature using the method fusion evaluation feature of transfer learning Scene Recognition prediction, it is hereby achieved that higher scene Recognition accuracy, solves scene recognition method in the prior art There are the not high technical problems of recognition accuracy.

Detailed description of the invention

In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is this hair Bright some embodiments for those of ordinary skill in the art without creative efforts, can be with root Other attached drawings are obtained according to these attached drawings.

Fig. 1 is a kind of flow chart of the scene recognition method based on indoor unit meeting signal enhancing in the embodiment of the present invention；

Fig. 2 is the schematic diagram of method shown in Fig. 1；

Fig. 3 is the structure chart of the Fusion Features neural network after the merging in method shown in Fig. 1；

Fig. 4 is the scene Recognition comparative result figure of method and present implementation in the prior art；

Fig. 5 is a kind of structure chart of the scene Recognition device based on indoor unit meeting signal enhancing in the embodiment of the present invention；

Fig. 6 is a kind of structure chart of computer readable storage medium in the embodiment of the present invention；

Fig. 7 is a kind of structure chart of computer equipment in the embodiment of the present invention.

Specific embodiment

The embodiment of the invention provides a kind of scene recognition methods and device based on indoor unit meeting signal enhancing, to change It is apt to the scene recognition method in the prior art technical problem not high there are recognition accuracy.

In order to reach above-mentioned technical effect, general thought of the invention is as follows:

Indoors in scene, image information and location information are obtained by mobile device to describe the language for being presently in scene Adopted type carries out the transfer learning based on research scene image to traditional convolutional neural networks model, to pass through convolutional Neural Network extracts the scene rank image feature of image acquired in mobile device.The location information of anchor point contains abundant simultaneously Position feature and relationship characteristic with scene around, position feature and the spliced positioning of relationship characteristic can be used as separately to feature One category feature carries out expansion enhancing to image feature.Wherein, anchor point is superimposed with scene base map locating for acquisition description anchor point The position feature of position, while the circle of uncertainty is constructed by position error and is superimposed with scene base map, count the superposition face of all kinds of scenes Product, to evaluate the relationship significance degree of anchor point Yu surrounding scene, position feature is indicated in vector form with relationship characteristic. After image feature vector and location feature vector are attached by the full articulamentum of Fusion Features, the feature vector of output continues Input the full articulamentum of subsequent final prediction, by the indoor positioning of acquisition and image data it is trained it is optimal merge it is complete with decision A possibility that articulamentum, final output anchor point is in all kinds of scenes size.

In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art Every other embodiment obtained without creative efforts, shall fall within the protection scope of the present invention.

Embodiment one

Present embodiments provide a kind of scene recognition method based on indoor unit meeting signal enhancing, referring to Figure 1, this method Include:

Step S1 is first carried out: for default application scenarios, acquiring typical scene image collection.

Specifically, default application scenarios can be chosen according to the requirement of actual research or application, typical field Scape image set can be acquired using existing method.

Then step S2 is executed: in conjunction with scene base map corresponding with scene to be studied, in the main roads of scene to be studied The location information and image information of upper acquisition mobile device, wherein location information and image information are corresponding with anchor point.

Specifically, scene base map refers to that the indoor base map of application scenarios, a scene base map include multiple scenes.It is indoor Opportunity signal is the radio signal using broadcast, TV, mobile communication and the acquisition of other kinds of equipment.In step S1 Scene to be studied in default application scenarios and step S2 is identical scene, but what is acquired in step S1 is scene image Collect, what is acquired in step S2 is location information and image information, and the data of the two acquisition are different.It is obtained by mobile device real-time Image and location information (i.e. opportunity signal), the information then obtained to mobile device are acquired.Anchor point and image information, There is corresponding relationship, for training the parameter of subsequent Fusion Features decision-making module between location information.In concrete implementation mistake Cheng Zhong can be in order to guarantee that data closer to reality, require image data certain perturbation and randomness Interference is added in shooting angle, focusing, exposure etc..

Next it executes step S3: scene image collection being inputted in default Fusion Features neural network, default feature is melted The convolutional neural networks module for closing neural network carries out model fine tuning, and acquisition is migrated to the convolutional neural networks mould of scene to be studied Type, wherein default Fusion Features neural network includes image feature extraction module and Fusion Features decision-making module, and image feature mentions Modulus block is realized by convolutional neural networks module, converts image feature vector for image information corresponding with anchor point.

Specifically, it presets Fusion Features neural network and is broadly divided into image feature extraction module and Fusion Features and feature Merge decision-making module, in present embodiment, choose convolutional neural networks as image feature and obtain module, export shape for [1, N_category] image feature vector, corresponding image belongs to the probability value of every class scene, N_categoryCorresponding research scene class sum, It is [1,2*N by shape in output tensor (image feature vector) splicing of convolutional neural networks_category+ 1] location feature input Tensor (location feature vector), common input feature vector merge full articulamentum, and output shape is [1,3*N_category+ 1] fusion feature Fusion feature vector input is finally predicted full articulamentum by vector, and output shape is [1, N_category] prediction result, member therein Element belongs to anchor point to characterize the probability size of corresponding scene.

In the specific implementation process, the image feature extraction module of default Fusion Features neural network can be chosen known Convolutional neural networks model, such as depth convolutional neural networks model AlexNet, Inception V2Inception V3, ResNet etc..As shown in figure 3, being a kind of structure chart of default Fusion Features neural network, image feature extraction module therein It can be realized using Inception V3, specifically include common convolution (convolutional layer and pond layer), Inception module, put down Equal pondization and full articulamentum.Fusion Features decision-making module includes the full articulamentum of Fusion Features and finally predicts full articulamentum, is used for The prediction of scene.

In one embodiment, step S3 is specifically included:

Step S3.1: scene image collection is inputted in default Fusion Features neural network, to convolutional neural networks module into Row training, updates the full connection layer parameter of convolutional neural networks module, retains the parameter of convolutional layer, obtains convolution after model fine tuning Neural network module, as the image feature extraction module after model fine tuning；

Step S3.2: will convolutional neural networks module after the fine tuning of corresponding with anchor point image information input model, obtain Tensor is exported, as image feature vector, and image feature vector is corresponding with anchor point.

Specifically, the full articulamentum of convolutional neural networks module can have multiple, the allusion quotation that can will be acquired in step S1 The scene image collection of type is divided into training set and verifying collection, for being trained to convolutional neural networks module, obtain migrating to Study the convolutional neural networks model of scene.

In the specific implementation process, the last one full articulamentum refers to the convolutional neural networks chosen and be used to be finely adjusted In, the full articulamentum connected after the activation primitive of the last one convolutional layer, the purpose is to the bottom for extracting convolution is special Sign transforms to scene grade another characteristic, final full articulamentum output tensor size can be substituted for [1, N_category], to make The classification number for obtaining convolutional neural networks output is adapted to research scene, namely completes transfer learning.

Step S4 is executed again: anchor point and scene base map are overlapped, and is obtained anchor point and is in the spy of the position in scene Levy vector.

In one embodiment, step S4 is specifically included:

Step S4.1 initializes position feature vector, is assigned a value ofPosition feature vector element Digit is N_category+ 1, wherein N_categoryFor scene type number, the feature being in using first term as anchor point outside all scenes Express item；

Step S4.3 saves position feature vector, and position feature vector is corresponded to specific anchor point.

Next step S5 is executed: will be centered on anchor point, the default circle of uncertainty and scene bottom of the position error for radius Figure is overlapped, and obtains the intersecting area of the circle of uncertainty and each scene, and according to intersecting area, obtains the pass of anchor point and each scene It is feature vector

Specifically, the positioning that anchor point is described by the location information of anchor point and scene base map and position error is special Then sign converts location feature vector, expansion enhancing can be carried out to image feature using location feature, to improve prediction Accuracy.

In one embodiment, step S6 is specifically included:

Step S6.1: by the position feature vector V of each anchor point_locationWith each anchor point and each scene relationship characteristic Vector V_relationIt is spliced into location feature vector V_positoning, shaped like:

Step S6.2: image feature vector and the splicing of location feature vector are configured to [1,3*N_category+ 1 feature to Vfuse is measured, shaped like:

Step S6.3: by spliced feature vector V_fuseInput feature vector merges decision-making module, and by Fusion Features decision The full articulamentum output shape of the Fusion Features of module is [1,3*N_category+ 1] fusion feature vector, then by fusion feature vector Via full articulamentum input is finally predicted, training characteristics merge the parameter of decision-making module, the Fusion Features decision after being trained Module.

Specifically, image feature vector is attached with location feature vector by the full articulamentum of Fusion Features, is obtained To spliced feature vector V_fuse, then continued to input the subsequent final full articulamentum of prediction, it can be to Fusion Features Decision-making module is trained.

It is executing step S7: the image feature extraction module obtained after model fine tuning and the feature after step training will be carried out The parameter of fusion decision-making module is fixed, the Fusion Features neural network after being merged.

Finally execute step S8: the corresponding image information of anchor point and anchor point and the relationship characteristic vector of each scene is defeated Enter the Fusion Features neural network after merging, scene prediction feature vector is obtained, by probability highest item pair in predicted characteristics vector The scene type answered is as scene Recognition result.

Specifically, the knowledge of scene can be used for by the Fusion Features neural network after merging obtained in step S7 Not, by after image information and relationship characteristic vector, then available prediction result, due to the Fusion Features neural network after merging In image feature extraction module be convolutional neural networks after being finely tuned by model, Fusion Features decision-making module is to pass through image What feature vector and the spliced feature vector training of location feature vector obtained, it can pass through fusion evaluation feature and positioning The recognition strategy of feature makes full use of two category features to carry out joint decision, makes pre- sniffing that fruiting quantities be divided to greatly reduce, so mention The high accuracy of prediction.

Step S8 is specifically included:

The image feature extraction module that will be obtained after the corresponding image information position feature vector input model fine tuning of anchor point, By the Fusion Features decision-making module after anchor point and the input training of the relationship characteristic vector of each scene, output anchor point belongs to each field The probability value of scape, using the corresponding scene type of probability value highest item as scene Recognition result.

In order to illustrate more clearly of the realization process of scene recognition method of the invention, below by a specific example It is introduced, refers to Fig. 2, be the schematic diagram of scene recognition method provided by the invention, the information of acquisition includes location information And image information, location information and image information are corresponding with anchor point, i.e. each anchor point, have corresponding location information And image information；By the superposition of anchor point and scene base map and position error, location feature vector can be obtained；It will determine The image information in site inputs convolutional neural networks, available image feature vector；Next by location feature vector and shadow As feature vector is spliced, fusion feature vector is obtained, the Fusion Features neural network being inputted after merging carries out feature Fusion and prediction, obtain prediction result.

When it is implemented, neural network parameter value is fixed, and save at can calling model, can be realized using routine call The automatic running of method flow specifically introduces the process of scene Recognition provided by the invention below by taking the scene of railway station as an example:

Step 1,8 class scenes are chosen for railway station and acquires typical scene image collection；

Step 2, for railway station scene, positioning of mobile equipment information is acquired on scene main roads in conjunction with scene base map And image information；

Step 3, the convolutional Neural of the typical default Fusion Features neural network of scene image collection input will be acquired in step 1 Network carries out model fine tuning to it, obtains the convolutional neural networks migrated to research scene, and pass through the volume after finely tuning model Product neural network is by the corresponding image information processing of anchor point at image feature vector.

Wherein, Fusion Features neural network is preset, image feature extraction module and Fusion Features and decision model are broadly divided into Block chooses Inception V3 convolutional neural networks as image feature and obtains module, and output shape is the image feature of [1,8] Vector, corresponding image belong to the probability value of every class scene, 8 class scenes in corresponding research place, by the defeated of convolutional neural networks The location feature that tensor splices that upper shape is [1,2*8+1] out inputs tensor, and common input feature vector merges full articulamentum, and output shape is Fusion feature vector is inputted the full articulamentum of final decision by the fusion feature vector of [1,3*8+1], and output shape is the pre- of [1,8] It surveys as a result, element represents the probability size belonged to for anchor point for scene.Such as it migrates to the convolutional Neural of research scene Network chooses trained Inception V3 model, and the final full articulamentum output tensor of Inception V3 model is big Small to be substituted for [1,8], the image set acquired using step 1 is finely adjusted training to Inception V3 model, updates full connection Layer parameter retains the original parameter of convolutional layer.The corresponding fine tuning of image input step 3.2 of the anchor point acquired in step 2 is completed Convolutional neural networks, save the output tensor of network, and correspond to specific anchor point.

Step 4, anchor point is superimposed with scene base map, obtains position feature vector in the scene at anchor point；

When it is implemented, the acquisition process of locating point position feature vector includes following sub-step in step 4:

Step 4.1, initialized location feature vector, since anchor point to be included is in the situation outside all scenes, thus Position feature vector element digit is 9, that is, represents scene type number 8 plus 1, is in first term as anchor point so outside scene Feature representation item, feature vector are assigned a value of { 0 }⁹；

Step 4.2, the relationship of anchor point Yu each scene is judged, if falling into kth class scene, by position feature vector Kth+1 is assigned a value of 1, shaped like:

If anchor point is not in any scene, the first term of location feature vector is assigned a value of 1, shaped like:

Step 4.3, position feature vector is saved, and corresponds to specific anchor point.

Step 5, by centered on anchor point, position error be superimposed with scene base map for the circle of uncertainty of radius, ask with it is all kinds of Scene intersecting area obtains the relationship characteristic vector of anchor point and each scene；

When it is implemented, the acquisition process of anchor point and the relationship characteristic vector of each scene includes following sub-step in step 5 It is rapid:

Step 5.1, the relationship characteristic vector with each scene is initialized, feature vector possesses 8 bits element, corresponding all kinds of scenes With the significance degree of anchor point relationship, it is assigned a value of { 0 }⁸；

Step 5.2, scene boundary is folded with to position point center, default position error 5m as the circle of uncertainty of radius Add calculating intersecting area, traverses each scene, area value is cumulative by scene type, and is assigned to the corresponding element of relationship characteristic vector Element.Finally feature vector is normalized, shaped like:

Step 5.3, it will be saved with each scene relationship characteristic vector, and correspond to specific anchor point.

Step 6, the feature vector that step 4 and step 5 obtain is spliced into location feature vector, then obtained with step 3 Corresponding image feature vector splicing and input feature vector fusion and the full articulamentum module of final decision, the training module parameter；

When it is implemented, Fusion Features and final decision module training process include following sub-step in step 6:

Step 6.1, the position feature vector V each anchor point obtained by step 4 and step 5_locationWith with each field Scape relationship characteristic vector V_relationIt is spliced into location feature vector V_positioning, shaped like:

Step 6.2, correspondence image feature vector and location feature the vector splicing that step 3.3 obtains are configured to [1,3*8 + 1] feature vector V_fuse, shaped like:

Step 6.3, by feature vector V_fuseInput feature vector fusion and final decision module, feature vector are complete by Fusion Features It is [1,3*8+1] fusion feature vector that articulamentum, which exports shape, fusion feature vector is inputted via the full articulamentum of final decision, most The prediction probability vector for exporting all kinds of scenes eventually participates in calculating penalty values with anchor point label, and updates the ginseng of each layer of the module Number.

Step 7, by the image feature extraction module that step 3 fine tuning is completed and the Fusion Features module that step 6 training is completed Parameter fix, the Fusion Features neural network after being merged；

Step 8, by the image information of the anchor point in the scene of railway station and via base map superposition generate location feature to Fusion Features neural network after amount input merging, obtains scene prediction feature vector, takes the corresponding scene class of probability highest item Type is scene Recognition result.

When it is implemented, being carried out using the Fusion Features neural network that training is completed to scene locating for anchor point in step 8 The process of identification includes following sub-step:

Step 8.1, the location information of anchor point is obtained into location feature vector by way of step 4 and step 5；

Step 8.2, by the image of anchor point and location feature vector respectively from the input layer of image feature extraction module with Location feature input layer input model, output anchor point belong to the probability value of each scene.

It is the scene Recognition comparative result figure of method in the prior art and present implementation see Fig. 4, wherein the left side is The scene Recognition result figure obtained using method in the prior art, black point are scene prediction erroneous point.Right figure is to use The scene Recognition result figure that method of the invention obtains, specifically when only by finely tuning Inception V3 convolutional neural networks When carrying out the scene Recognition of anchor point image, since the real-time imaging of mobile device has biggish random perturbation and noise, volume Product neural network is difficult to extract significant scene characteristic, and the point of prediction error is more.And through the invention in fusion evaluation The recognition strategy of feature and location feature can then make full use of two category features to carry out joint decision, and pre- sniffing is made to divide fruiting quantities Greatly reduce, improves the accuracy of prediction.

Based on the same inventive concept, present invention also provides can signal enhancing based on indoor unit with one of embodiment one The corresponding device of scene recognition method, detailed in Example two.

Embodiment two

The present embodiment provides a kind of scene Recognition devices based on indoor unit meeting signal enhancing, refer to Fig. 5, the device packet It includes:

Scene image collection acquisition module 501, for acquiring typical scene image collection for default application scenarios；

Location information and image information acquisition module 502, for combine scene base map corresponding with scene to be studied, to Study the location information and image information that mobile device is acquired on the main roads of scene, wherein location information includes anchor point；

Transfer learning module 503, for inputting scene image collection in default Fusion Features neural network, to default feature The convolutional neural networks module of fused neural network carries out model fine tuning, and acquisition is migrated to the convolutional neural networks of scene to be studied Model, wherein default Fusion Features neural network includes image feature extraction module and Fusion Features decision-making module, image feature Extraction module is realized by convolutional neural networks module, converts image feature vector for image information corresponding with anchor point；

Position feature vector calculation module 504 obtains anchor point and is in for anchor point and scene base map to be overlapped Position feature vector in scene；

Relationship characteristic vector calculation module 505, for by centered on anchor point, default position error for radius error Circle is overlapped with scene base map, obtains the intersecting area of the circle of uncertainty and each scene, and according to intersecting area, acquisition anchor point and The relationship characteristic vector of each scene；

Fusion Features decision-making module training module 506, position feature vector for being in anchor point in scene and fixed The relationship characteristic vector in site and each scene is spliced into location feature vector, then location feature vector is corresponding with anchor point After the splicing of image feature vector, input feature vector merges decision-making module, the parameter of training this feature fusion decision-making module；

Merging module 507, for it will carry out the image feature extraction module obtained after model fine tuning and step training after The parameter of Fusion Features decision-making module is fixed, the Fusion Features neural network after being merged；

Prediction module 508, for by the relationship characteristic vector of the corresponding image information of anchor point and anchor point and each scene Fusion Features neural network after input merging, obtains scene prediction feature vector, by probability highest item in predicted characteristics vector Corresponding scene type is as scene Recognition result.

In one embodiment, transfer learning module 503 is specifically used for:

Scene image collection is inputted in default Fusion Features neural network, convolutional neural networks module is trained, more The full connection layer parameter of new convolutional neural networks module, retains the parameter of convolutional layer, obtains convolutional neural networks after model fine tuning Module, as the image feature extraction module after model fine tuning；

Will convolutional neural networks module after the fine tuning of corresponding with anchor point image information input model, obtain output tensor, As image feature vector, and image feature vector is corresponding with anchor point.

In one embodiment, position feature vector calculation module 504 is specifically used for:

Position feature vector is initialized, is assigned a value ofPosition feature vector element digit is N_category+ 1, wherein N_categoryFor scene type number, the feature representation item being in using first term as anchor point outside all scenes；

The relationship of anchor point Yu each scene is judged, if anchor point is fallen into kth class scene, by the of position feature vector K+1 are assigned a value of 1, specifically:

Position feature vector is saved, and position feature vector is corresponded into specific anchor point.

In one embodiment, relationship characteristic vector calculation module 505 is specifically used for:

The relationship characteristic vector of initialization and each scene, feature vector have N_categoryBit element, corresponding all kinds of scenes with The significance degree of anchor point relationship, wherein N_categoryFor scene type sum, it is assigned a value of

By scene boundary and to position point center, default position error R_noiseCalculating is overlapped for the circle of uncertainty of radius Intersecting area traverses each scene, and area value is cumulative by scene type, and is assigned to the corresponding element of relationship characteristic vector, then will Feature vector normalization, obtains relationship characteristic vector, shaped like:

{S_i/1N_categoryS_i}

Anchor point and the relationship characteristic vector of each scene are saved, and relationship characteristic vector is corresponded to and is specifically determined Site.

In one embodiment, Fusion Features decision-making module training module 506 is specifically used for:

By the position feature vector V of each anchor point_locationWith each anchor point and each scene relationship characteristic vector V_relationIt is spliced into location feature vector V_positioning, shaped like:

Image feature vector and the splicing of location feature vector are configured to [1,3*N_category+ 1] feature vector V_fuse, shape Such as:

By spliced feature vector V_fuseInput feature vector merges decision-making module, and by the feature of Fusion Features decision-making module Merging full articulamentum output shape is [1,3*N_category+ 1] fusion feature vector, then by fusion feature vector via final pre- Full articulamentum input is surveyed, training characteristics merge the parameter of decision-making module, the Fusion Features decision-making module after being trained.

In one embodiment, prediction module 508 is specifically used for:

By the device that the embodiment of the present invention two is introduced, to implement in the embodiment of the present invention one based on indoor opportunity signal Device used by the scene recognition method of enhancing, so based on the method that the embodiment of the present invention one is introduced, belonging to this field Personnel can understand specific structure and the deformation of the device, so details are not described herein.The method of all embodiment of the present invention one Used device belongs to the range of the invention to be protected.

Embodiment three

Based on the same inventive concept, present invention also provides a kind of computer readable storage medium 600, Fig. 6 is referred to, On be stored with computer program 611, the program be performed realize embodiment one in method.

By the computer readable storage medium that the embodiment of the present invention three is introduced, to implement base in the embodiment of the present invention one The computer readable storage medium used by the scene recognition method of indoor unit meeting signal enhancing, so implemented based on the present invention The method that example one is introduced, the affiliated personnel in this field can understand specific structure and the deformation of the computer readable storage medium, So details are not described herein.Computer readable storage medium used by the method for all embodiment of the present invention one belongs to this hair The bright range to be protected.

Example IV

Based on the same inventive concept, present invention also provides a kind of computer equipments, refer to Fig. 7, including memory 701, processor 702 and storage on a memory and the computer program 703 that can run on a processor, processor execution program The method of Shi Shixian embodiment one.

By the computer equipment that the embodiment of the present invention four is introduced, to implement to be based on indoor unit in the embodiment of the present invention one Equipment used by the scene recognition method of meeting signal enhancing, so based on the method that the embodiment of the present invention one is introduced, ability The affiliated personnel in domain can understand specific structure and the deformation of the computer equipment, so details are not described herein.All present invention are real It applies computer equipment used by the method for example one and belongs to the range of the invention to be protected.

It should be understood by those skilled in the art that, the embodiment of the present invention can provide as method, system or computer program Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the present invention Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the present invention, which can be used in one or more, The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) produces The form of product.

The present invention be referring to according to the method for the embodiment of the present invention, the process of equipment (system) and computer program product Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.

These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates, Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or The function of being specified in multiple boxes.

These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one The step of function of being specified in a box or multiple boxes.

Although preferred embodiments of the present invention have been described, it is created once a person skilled in the art knows basic Property concept, then additional changes and modifications may be made to these embodiments.So it includes excellent that the following claims are intended to be interpreted as It selects embodiment and falls into all change and modification of the scope of the invention.

Obviously, those skilled in the art can carry out various modification and variations without departing from this hair to the embodiment of the present invention The spirit and scope of bright embodiment.In this way, if these modifications and variations of the embodiment of the present invention belong to the claims in the present invention And its within the scope of equivalent technologies, then the present invention is also intended to include these modifications and variations.

Claims

1. a kind of scene recognition method based on indoor unit meeting signal enhancing characterized by comprising

Step S2: it in conjunction with scene base map corresponding with scene to be studied, acquires and moves on the main roads of the scene to be studied The location information and image information of dynamic equipment, wherein the location information and image information are corresponding with anchor point；

Step S3: the scene image collection is inputted in default Fusion Features neural network, to the default Fusion Features nerve The convolutional neural networks module of network carries out model fine tuning, and acquisition is migrated to the convolutional neural networks mould of the scene to be studied Type, wherein the default Fusion Features neural network includes image feature extraction module and Fusion Features decision-making module, the shadow Picture characteristic extracting module is realized by the convolutional neural networks module, converts image spy for image information corresponding with anchor point Levy vector；

Step S4: anchor point and the scene base map are overlapped, and are obtained anchor point and are in the position feature vector in scene；

Step S5: by centered on anchor point, default position error be overlapped with the scene base map for the circle of uncertainty of radius, The intersecting area of the circle of uncertainty Yu each scene is obtained, and according to the intersecting area, obtains the anchor point and each scene Relationship characteristic vector；

Anchor point: being in the relationship characteristic vector of the position feature vector and anchor point and each scene in scene by step S6, is spelled It is connected in location feature vector, then by after location feature vector image feature vector splicing corresponding with anchor point, inputs institute State Fusion Features decision-making module, the parameter of the training Fusion Features decision-making module；

Step S7: the image feature extraction module obtained after model fine tuning and the Fusion Features decision model after step training will be carried out The parameter of block is fixed, the Fusion Features neural network after being merged；

Step S8: after the corresponding image information of anchor point and anchor point are inputted described merge with the relationship characteristic vector of each scene Fusion Features neural network, obtain scene prediction feature vector, probability highest item in the predicted characteristics vector is corresponding Scene type is as scene Recognition result.

2. the method as described in claim 1, which is characterized in that step S3 is specifically included:

Step S3.1: the scene image collection is inputted in default Fusion Features neural network, to the convolutional neural networks mould Block is trained, and updates the full connection layer parameter of the convolutional neural networks module, retains the parameter of convolutional layer, and it is micro- to obtain model Convolutional neural networks module after tune, as the image feature extraction module after model fine tuning；

Step S3.2: image information corresponding with anchor point is inputted into convolutional neural networks module after the model fine tuning, is obtained Tensor is exported, as the image feature vector, and the image feature vector is corresponding with anchor point.

3. the method as described in claim 1, which is characterized in that step S4 is specifically included:

Step S4.1 initializes position feature vector, is assigned a value ofPosition feature vector element position Number is N_category+ 1, wherein N_categoryFor scene type number, the mark sheet being in using first term as anchor point outside all scenes Up to item；

Step S4.2 judges the relationship of anchor point Yu each scene, if anchor point is fallen into kth class scene, by position feature to Kth+1 of amount is assigned a value of 1, specifically:

The position feature vector is saved, and position feature vector is corresponded to specific anchor point by step S4.3.

4. the method as described in claim 1, which is characterized in that step S5 is specifically included:

Step S5.1, the relationship characteristic vector of initialization and each scene, feature vector have N_categoryBit element, corresponding all kinds of fields The significance degree of scape and anchor point relationship, wherein N_categoryFor scene type sum, it is assigned a value of

Step S5.2, by scene boundary and to position point center, default position error R_noiseIt is folded for the circle of uncertainty of radius Add calculating intersecting area, traverses each scene, area value is cumulative by scene type, and is assigned to the corresponding element of relationship characteristic vector Element, then feature vector is normalized, the relationship characteristic vector is obtained, shaped like:

{S_i/1N_categoryS_i}

Anchor point and the relationship characteristic vector of each scene are saved, and relationship characteristic vector are corresponded to tool by step S5.3 The anchor point of body.

5. the method as described in claim 1, which is characterized in that step S6 is specifically included:

Step S6.2: the image feature vector and location feature vector splicing are configured to [1,3*N_category+ 1] spy Levy vector V_fuse, shaped like:

Step S6.3: by spliced feature vector V_fuseThe Fusion Features decision-making module is inputted, and is determined by the Fusion Features The full articulamentum output shape of the Fusion Features of plan module is [1,3*N_category+ 1] fusion feature vector, it is then that the fusion is special Sign vector trains the parameter of the Fusion Features decision-making module, the spy after being trained via full articulamentum input is finally predicted Sign fusion decision-making module.

6. the method as described in claim 1, which is characterized in that step S8 is specifically included:

The corresponding image information position feature vector of anchor point is inputted to the image feature extraction module obtained after the model fine tuning, The relationship characteristic vector of the anchor point and each scene is inputted into the Fusion Features decision-making module after the training, exports anchor point The probability value for belonging to each scene, using the corresponding scene type of probability value highest item as the scene Recognition result.

7. a kind of scene Recognition device based on indoor unit meeting signal enhancing characterized by comprising

Location information and image information acquisition module, for combining scene base map corresponding with scene to be studied, described wait grind Study carefully the location information and image information that mobile device is acquired on the main roads of scene, wherein the location information and image letter It ceases corresponding with anchor point；

Transfer learning module, for inputting the scene image collection in default Fusion Features neural network, to the default spy The convolutional neural networks module for levying fused neural network carries out model fine tuning, obtains the convolution mind migrated to the scene to be studied Through network model, wherein the default Fusion Features neural network includes image feature extraction module and Fusion Features decision model Block, the image feature extraction module are realized by the convolutional neural networks module, and image information corresponding with anchor point is turned Turn to image feature vector；

Position feature vector calculation module obtains anchor point and is in field for anchor point and the scene base map to be overlapped Position feature vector in scape；

Relationship characteristic vector calculation module, for will be centered on anchor point, the default circle of uncertainty and institute of the position error for radius It states scene base map to be overlapped, obtains the intersecting area of the circle of uncertainty Yu each scene, and according to the intersecting area, obtain institute State the relationship characteristic vector of anchor point Yu each scene；

Fusion Features decision-making module training module, position feature vector for being in anchor point in scene and anchor point and each The relationship characteristic vector of scene, is spliced into location feature vector, then by location feature vector image corresponding with anchor point After feature vector splicing, the Fusion Features decision-making module, the parameter of the training Fusion Features decision-making module are inputted；

Merging module, for the image feature extraction module obtained after model fine tuning and the Fusion Features after step training will to be carried out The parameter of decision-making module is fixed, the Fusion Features neural network after being merged；

Prediction module, for described in the relationship characteristic vector input by the corresponding image information of anchor point and anchor point and each scene Fusion Features neural network after merging, obtains scene prediction feature vector, by probability highest item in the predicted characteristics vector Corresponding scene type is as scene Recognition result.

8. device as claimed in claim 7, which is characterized in that transfer learning module is specifically used for:

The scene image collection is inputted in default Fusion Features neural network, the convolutional neural networks module is instructed Practice, update the full connection layer parameter of the convolutional neural networks module, retain the parameter of convolutional layer, obtains convolution after model fine tuning Neural network module, as the image feature extraction module after model fine tuning；

Image information corresponding with anchor point is inputted into convolutional neural networks module after the model fine tuning, obtains output tensor, As the image feature vector, and the image feature vector is corresponding with anchor point.

9. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is performed reality The now method as described in any one of claims 1 to 6 claim.

10. a kind of computer equipment including memory, processor and stores the meter that can be run on a memory and on a processor Calculation machine program, which is characterized in that realized when the processor executes described program as any one of claims 1 to 6 right is wanted Seek the method.