CN107832727A

CN107832727A - A kind of indoor mall shop feature extracting method

Info

Publication number: CN107832727A
Application number: CN201711167137.6A
Authority: CN
Inventors: 方璐; 徐子威; 郑海天; 庞敏健; 苏雄飞; 王好谦
Original assignee: Shenzhen Weilai Media Technology Research Institute; Shenzhen Graduate School Tsinghua University
Current assignee: Shenzhen Weilai Media Technology Research Institute; Shenzhen Graduate School Tsinghua University
Priority date: 2017-11-21
Filing date: 2017-11-21
Publication date: 2018-03-23

Abstract

The invention discloses a kind of indoor mall shop feature extracting method, include M shop in market；Comprise the following steps：S1, structure shop decorated style identification neutral net prototype；S2, build shop text detection neutral net prototype；S3, the shop image data set established in the market, each image include the text information and decorated style information in shop, image data set are divided into training set and test set；The weights of initialization decorated style identification neutral net and text detection neutral net are distributed by gaussian random, train decorated style to identify neutral net prototype and text detection neutral net prototype to minimize cost function as target, determine the model structure of shop decorated style identification neutral net and text detection neutral net；S4, the identification of shop decorated style, shop text detection；S5, shop Text region；S6, recognition result fusion.The extracting method of the present invention, it is remarkably improved the degree of accuracy of shop identification.

Description

A kind of indoor mall shop feature extracting method

【Technical field】

The present invention relates to computer vision and digital image processing field, and high-level vision feature is based on more particularly to one kind Indoor mall shop feature extracting method.

【Background technology】

In the indoor environment for lacking portable, inexpensive alignment system (GPS as being used for outdoor positioning), indoor positioning System (IPS) is always an attractive research topic.Using based on the indoor locating system of infrastructure device including the use of RFID's, use fluorescent lamp or Wi-Fi access points, the performance impressive of these systems in actual applications.It is another Aspect, the superiority and challenge that the IPS independent of infrastructure has in itself cause more extensive concern.It is existing Method proposes the IPS based on computer vision using image retrieval technologies, what these systems can be shot by smart mobile phone Photo tells the position of user.However, all these methods are required for carrying out offline database structure in advance, this process is suitable It is time-consuming and expensive.

In recent years, the latest developments in terms of robotics and computer vision have new application again on IPS.It is synchronous fixed Position and map structuring (SLAM) technology and visual token (VO) are due to can accurately estimate to move and as IPS important technology. Answered in systems in practice using the monocular SLAM systems of bag of words, monocular VO algorithms and system based on LIDAR With.However, operation SLAM or VO means that user must be by video camera or laser transceiver recorded video.

In order to solve this problem, there is scholar to propose to carry out shop orientation based on the text recognition method in image.Specifically For, this system is classified in the shop in image by text identification, and the terrestrial reference as coarse localization (passes through Shop classification is positioned).This method has flexible scalability, because except the market plan for needing to mark in advance is made Outside input, it is not necessary to collect the largely data on indoor scene in advance.But only by text identification shop precision Not high, because in actual environment, many words, which are submerged in noise, to be difficult to detect.

The disclosure of background above technology contents is only used for inventive concept and the technical scheme that auxiliary understands the present invention, and it is not The prior art of present patent application is necessarily belonged to, shows the applying date of the above in present patent application in no tangible proof In the case of disclosed, above-mentioned background technology should not be taken to evaluate the novelty and creativeness of the application.

【The content of the invention】

The technical problems to be solved by the invention are：Above-mentioned the deficiencies in the prior art are made up, propose a kind of indoor mall shop Feature extracting method is spread, is remarkably improved the degree of accuracy of shop identification.

The technical problem of the present invention is solved by following technical scheme：

A kind of indoor mall shop feature extracting method, include M shop in the market；Comprise the following steps：S1, structure Build shop decorated style identification neutral net prototype, the decorated style identification neutral net prototype include convolutional layer, pond layer, Excitation layer and full articulamentum；S2, builds shop text detection neutral net prototype, and the text detection neutral net prototype includes Convolutional layer, pond layer, excitation layer and warp lamination；S3, the shop image data set established in the market, each image include shop The text information and decorated style information of paving, training set and test set are divided into by image data set；It is distributed by gaussian random The weights of the decorated style identification neutral net and text detection neutral net are initialized, to minimize cost function as target The text detection neutral net prototype come in decorated style identification neutral net prototype and step S2 in training step S1, it is determined that Go out the model structure of shop decorated style identification neutral net and text detection neutral net；S4, shop name is included for one Claim the picture to be identified with upholstery style, be inputted shop decorated style neutral net and the text detection god respectively In model structure through network, obtain the picture and belong in the probability and the picture in each shop the area for corresponding to word segment Domain；S5, by the region input Text region module that word segment is corresponded in the picture obtained in step S4, obtain the picture NGRAM corresponding to middle word segment is encoded；S6, according to the probability calculation shop decorated style identification probability obtained in step S4, NGRAM codings according to being obtained in step S5 calculate Text region probability, and the identification in finally corresponding M shop is obtained using weighting Probability, the shop recognition result of the picture is used as using shop corresponding to maximum probability value in obtained probability.

The beneficial effect that the present invention is compared with the prior art is：

The indoor mall shop feature extracting method of the present invention, StoreFront identification is carried out based on high-level vision feature, passes through structure Build neutral net and be trained determination with a large amount of shop image data sets and obtain decorated style identification neutral net and word Neutral net is detected, uses the text in the decorated style information and picture in the shop in above-mentioned neutral net acquisition picture to be measured Block domain, and the text information (NGRAM codings) in character area identification shop is combined, then by decorated style information and word Both information of information are merged to identify current shop.The fusion of shop decorated style and text information significantly improves shop Spread the degree of accuracy of identification so that the indoor mall positioning based on the fusion results also can be more healthy and stronger.

【Brief description of the drawings】

Fig. 1 is the block schematic illustration of the indoor mall shop feature extracting method of the specific embodiment of the invention；

Fig. 2 is the flow chart of the indoor mall shop feature extracting method of the specific embodiment of the invention；

Fig. 3 is the signal of the concrete model structure of the decorated style identification neutral net in the specific embodiment of the invention Figure；

Fig. 4 is the signal of the concrete model structure of the shop text detection neutral net in the specific embodiment of the invention Figure；

Fig. 5 is the experiment test that the indoor mall shop feature extracting method in the specific embodiment of the invention is used to position Result figure.

【Embodiment】

With reference to embodiment and compare accompanying drawing the present invention is described in further details.

As shown in figure 1, the frame diagram of the indoor mall shop feature extracting method of present embodiment, including shop dress The identification of decorations style, shop text detection merge with identification, recognition result, are carried out positioning four modules according to recognition result.Such as Fig. 2 It is shown, it is the flow chart of the indoor mall shop feature extracting method of present embodiment, comprises the following steps：

S1, structure shop decorated style identification neutral net prototype, the decorated style identification neutral net prototype include Convolutional layer, pond layer, excitation layer and full articulamentum.

S2, builds shop text detection neutral net prototype, and the text detection neutral net prototype includes convolutional layer, pond Change layer, excitation layer and warp lamination.

S3, the shop image data set established in the market, text information and affiliated shop of each image including shop Information, image data set is divided into training set and test set；The decorated style is initialized by gaussian random distribution to identify The weights of neutral net and text detection neutral net, the decoration wind come using minimizing cost function as target in training step S1 Lattice identify the text detection neutral net prototype in neutral net prototype and step S2, determine shop decorated style identification nerve The model structure of network and text detection neutral net.

In present embodiment, the shop image data set of foundation includes 2876 RGB shops pictures, covers a market In all 56 different shops.The mark and affiliated brand, the mark of decorated style of every image character area including shop Note.In use, the RGB image to be cut into 224x224 image block, and mark each RGB image shop title and Character area.Shop data set is divided into test set and training set, wherein training set there are 2300, and test set there are 576.

During training, cost function is：L=- ∑s_it_ilog(y_i)+λ||W||²。

Wherein, when decorated style identifies neutral net prototype in training step S1, y_iRepresent the image in the training set Shop probability obtained by after decorated style identifies neural metwork training, t_iBelong to M for the image in the training set The actual probabilities in each shop in shop, λ represent the regularization coefficient of cost function, and W represents the decorated style identification nerve The weights of convolutional layer and full articulamentum in network model.

In training step S2 during shop text detection neutral net prototype, y_iRepresent the image in the training set by shop Each pixel belongs to the probability of character area, t in the image exported after paving text detection neural metwork training_iFor the training Each pixel belongs to the actual probabilities of character area in the image of concentration, and λ represents the regularization coefficient of cost function, and W is represented The weights of convolutional layer and warp lamination in the text detection neural network model.

Usually, λ can take system recognition accuracy soprano on training set by cross-validation experiments, and it is logical Normal value includes 0.01,0.03,0.1,0.3 etc., regularization coefficient λ=0.01 of cost function in present embodiment.

In the network described in training step S1,8 224x224 fritter is intercepted from each picture of training set at random And it is sent into network.During interception, the deviation of fritter is generated at random, then the coordinate using this deviation as the fritter upper left corner, Intercept out a fritter.Network carries out operation, the outputs such as convolution, Chi Hua, activation and full connection to input picture successively to be passed through The picture of One-hot codings belongs to the probability s of each brand_s∈R⁵⁶.This probability is used for calculation cost function simultaneously.

When training the network described in S2, any interception is not carried out to picture, but directly input picture in its entirety, through pulleying After product, Chi Hua, excitation and deconvolution operation, network exports the two-value picture of a formed objects.Export the picture that picture intermediate value is 1 Vegetarian refreshments represents character area, and 0 represents non-legible region.This two-value picture be used to be compared and calculate with character area mark Cost function.In this embodiment, the picture number for being input to neutral net described in S1 and S2 every time is 20, and learning rate is 0.001。

The weights initialisation of convolutional neural networks uses one-dimensional gaussian profile, then constantly minimizes cost by algorithm Function, you can the weights W of the equivalent layer in neutral net specific value is determined, so as to finally give the specific of neutral net Model structure.Preferably, foregoing cost function is minimized using AdamOptimizer method in present embodiment, Adam algorithms are more applicable for the network that data volume is big, the number of plies is deep compared to traditional SGD (stochastic gradient descent).Adam algorithms It is to carry out single order to the gradient of each parameter according to cost function away from estimation and second order away from estimation, so as to which dynamic adjusts each parameter Learning rate.The advantages of Adam algorithms, is more steady in the parameter in iterative process each time, it is not easy to is trapped in part most It is excellent.

By training, the concrete model structure for the neutral net for determining to obtain in present embodiment is：

The model structure of decorated style identification neutral net includes the first convolution unit, the individual full articulamentums of N2, wherein the One convolution unit includes the N1 convolutional layers being sequentially connected in series, connected after each convolutional layer a pond layer and a ReLu excitation Layer, the output end series connection N2 full articulamentum of first convolution unit, the output dimension of last full articulamentum are M, for exporting the binary set of M dimensions；Wherein, N1 and N2 value causes the model structure to reach highest standard on training set True rate.It is trained, it is determined that obtaining N1=5 in present embodiment, N2=3.It is the dress of present embodiment shown in Fig. 3 The concrete model structure of style identification neutral net is adornd, it includes 5 convolutional layers being sequentially connected in series, the size difference of each convolutional layer For 11x11,5x5,3x3,3x3,3x3, connected behind each convolutional layer a pond layer and a ReLu excitation layer, pond layer It can be exported with convolutional layer and carry out dimensionality reduction, reduce the complexity of network, improve the generalization ability of network.Pond in present embodiment Change layer to be operated using L2 Normalization.ReLu excitation layers can solve gradient in deep neural network training process and disappear The problem of mistake.The output end of the convolutional layer 3 full articulamentums of last series connection.Output of 3 full articulamentums to convolutional layer is carried out Processing and classification, the output dimension of last full articulamentum is arranged to 56 dimensions, therefore exports the binary set of 56 dimensions.Two-value to The value of element in amount represents that the picture of input neutral net belongs to the binaryzation result of the probability in each shop in 56 shops. The binary set of 56 dimension of full articulamentum output is handled in subsequent step S6 by sigmoid functions, obtains picture Belong to the specific probable value s in each shop_s, the as recognition result of decorated style.

The model structure of text detection neutral net includes the second convolution unit and N4 warp lamination；The volume Two Product unit includes the N3 convolutional layers being sequentially connected in series, respectively connected after the convolutional layer of part a pond layer and an excitation layer；Described The output end series connection N4 warp lamination of two convolution units, last warp lamination are used to export binaryzation picture, schemed The pixel that piece intermediate value is 1 represents character area, and the pixel being worth for 0 represents non-legible region, wherein, N3 and N4 value make Obtain the model structure and reach highest accuracy rate on training set.It is trained, it is determined that obtaining above-mentioned specific value.Shown in Fig. 4 i.e. For the concrete model structure of the shop text detection neutral net of present embodiment, it includes 13 convolutional layers, convolutional layer The all 3x3 of size, 2x2 pond layer is respectively added after the 2nd, 4,7,10,13 layer, and each connect one after the layer of pond Individual ReLu excitation layers.Two warp laminations are connected in series after 13 convolutional layers.The output of warp lamination and input neutral net Picture size it is identical, the value of each of which pixel represents the probability that the pixel belongs to character area.

S4, for a picture to be identified comprising shop title and upholstery style, the shop is inputted respectively In the model structure for spreading decorated style neutral net and text detection neutral net, the probability that the picture belongs to each shop is obtained And the region of word segment is corresponded in the picture.

For picture to be identified, it is inputted in decorated style identification neural network model, and neural network model is to input Picture to be identified carry out convolution, Chi Hua, activation and the processing connected entirely successively, output by One-hot codings the picture Belong to the probability in each shop.

For picture to be identified, it is inputted in text detection neural network model, and neural network model is treated to input After identification picture carries out convolution, Chi Hua, excitation and deconvolution processing successively, two with inputting picture formed objects are exported It is worth picture, the pixel that picture intermediate value is 1 represents character area, and the pixel being worth for 0 represents non-legible region.

S5, by the region input Text region module that word segment is corresponded in the picture obtained in step S4, obtain NGRAM corresponding to word segment is encoded in the picture.

Specifically, detection obtains corresponding to the area of text information in picture to be identified in text detection network model in step S4 Domain, binaryzation picture is inputted in a Text region module, its two-value picture to the output of text detection neural network model Text identification processing is carried out, obtains the NGRAM codings of the word in picture to be identified.Text region module can be Max Jaderberg et al. is in paper Deep Structured Output Learning for Unconstrained Text The model proposed in Recognition, its input is character area picture, exports and encodes G for the NGRAM of word in picture_N∈ R¹⁰⁰⁰⁰。

S6, according to the probability calculation shop decorated style identification probability obtained in step S4, according to what is obtained in step S5 NGRAM codings calculate Text region probability, the identification probability in finally corresponding M shop are obtained using weighting, with obtained probability Shop recognition result of the shop as the picture corresponding to middle maximum probability value.

In the step, shop decorated style identification probability is calculated according to such as minor function：s_s=f_s(a), wherein, a is represented The output of the decorated style identification neutral net, the binary set of as 56 dimensions, s_sBelong to the general of each shop for the picture Rate, i.e., 56 dimensional vectors being made up of 56 specific probable values, f_sFor sigmoid functions.

Text region probability is calculated according to such as minor function：s_t(j)=∑_kG_N(k) I (N (k) ∈ S (j)), wherein, S (j) represent that character string corresponding to the trade name in j-th of shop, k represent the bits of coded in NGRAM coding schemes, be in M shop Integer in 0 ..., 9999, N (k) represent character string corresponding to k-th of bits of coded in NGRAM codings, and N (k) ∈ S (j) represent word Symbol string N (k) belongs to S (j), and I is indicative function, G_N(k) the NGRAM codings of word recognized in input picture are represented.

After above two probable value is calculated, final identification probability is calculated according to equation below：Y=(1- α) s_t+αs_s, wherein, s_tRepresent Text region probability, s_sRepresent decorated style identification probability；α represents decorated style identification probability Weights.α value is that the model structure for making the decorated style identify neutral net is taken by cross-validation experiments in training set Recognition accuracy highest value.Checking obtains α=0.4 in present embodiment.

After final identification probability y is calculated, it is the vector of one group of 56 dimension, by corresponding to maximum probability value in vector Shop is the shop recognition result as the picture.Further, obtain in picture after the recognition result in shop, market can be combined Map searches location prediction point corresponding with being identified shop, and using these places as positioning result.Fig. 5 show experiment and surveyed Test result figure, wherein left side (a) is classified as input picture, image upper right has real shop name label.Right side (b) is classified as output Positioning result.Five-pointed star pattern represents output, and round dot represents physical location.Experiments verify that the framework of present embodiment The shop recognition accuracy of lower progress can reach 86.39%.

To sum up, net is built by collecting a large amount of indoor scene images (shop image data set) in present embodiment Network prototype, minimize cost function and train neural network model, so as to obtain decorated style neutral net and text detection god Concrete model through network, and obtain the NGRAM codings of word with reference to Text region module.Finally by word and decorated style Blend, so as to recognize shop is specifically which shop in market.The method of present embodiment can be obviously improved shop Paving identification and the degree of accuracy of indoor mall positioning.

Above content is to combine specific preferred embodiment further description made for the present invention, it is impossible to is assert The specific implementation of the present invention is confined to these explanations.For general technical staff of the technical field of the invention, Some replacements or obvious modification are made on the premise of not departing from present inventive concept, and performance or purposes are identical, should all be considered as Belong to protection scope of the present invention.

Claims

A kind of 1. indoor mall shop feature extracting method, it is characterised in that：Include M shop in the market；Including following Step：S1, structure shop decorated style identification neutral net prototype, the decorated style identification neutral net prototype include convolution Layer, pond layer, excitation layer and full articulamentum；S2, build shop text detection neutral net prototype, the text detection nerve net Network prototype includes convolutional layer, pond layer, excitation layer and warp lamination；S3, the shop image data set established in the market, each figure Text information and decorated style information as including shop, training set and test set are divided into by image data set；Pass through height This random distribution initializes the weights of the decorated style identification neutral net and text detection neutral net, to minimize cost Function is the text detection nerve net that target is come in decorated style identification neutral net prototype and step S2 in training step S1 Network prototype, determine the model structure of shop decorated style identification neutral net and text detection neutral net；S4, for one Picture to be identified comprising shop title and upholstery style, be inputted respectively the shop decorated style neutral net and In the model structure of text detection neutral net, obtain the picture and belong in the probability and the picture in each shop to correspond to word Partial region；S5, by the region input Text region module that word segment is corresponded in the picture obtained in step S4, obtain Into the picture, NGRAM corresponding to word segment is encoded；S6, known according to the probability calculation shop decorated style obtained in step S4 Other probability, Text region probability is calculated according to the NGRAM codings obtained in step S5, obtains finally corresponding to M shop using weighting The identification probability of paving, the shop recognition result of the picture is used as using shop corresponding to maximum probability value in obtained probability.
2. indoor mall shop according to claim 1 feature extracting method, it is characterised in that：In step S3, cost letter Number is：L=- ∑s_it_ilog(y_i)+λ||W||², wherein, when decorated style identifies neutral net prototype in training step S1, y_iTable Shop probability obtained by showing the image in the training set after decorated style identifies neural metwork training, t_iFor the instruction Practice the actual probabilities that the image concentrated belongs to each shop in M shop, λ represents the regularization coefficient of cost function, and W represents institute State the weights of convolutional layer and full articulamentum in decorated style identification neural network model；Text detection god in shop in training step S2 During through web original, y_iRepresent the image that the image in the training set exports after the text detection neural metwork training of shop In each pixel belong to the probability of character area, t_iBelong to character area for each pixel in the image in the training set Actual probabilities, λ represents the regularization coefficient of cost function, W represent in the text detection neural network model convolutional layer and The weights of warp lamination.
3. indoor mall shop according to claim 1 feature extracting method, it is characterised in that：In step S3, pass through Adam algorithmic minimizing cost functions determine the model of shop decorated style identification neutral net and text detection neutral net Structure.
4. indoor mall shop according to claim 1 feature extracting method, it is characterised in that：In step S3, determine The model structure of decorated style identification neutral net include the first convolution unit, the individual full articulamentums of N2, wherein first Convolution unit includes the N1 convolutional layers being sequentially connected in series, connected after each convolutional layer a pond layer and a ReLu excitation Layer, the output end series connection N2 full articulamentum of first convolution unit, the output dimension of last full articulamentum are M, for exporting the binary set of M dimensions；Wherein, N1 and N2 value causes the model structure to reach highest standard on training set True rate.
5. indoor mall shop according to claim 1 feature extracting method, it is characterised in that：In step S3, determine The model structure of the text detection neutral net include the second convolution unit and N4 warp lamination；Second convolution Unit includes the N3 convolutional layers being sequentially connected in series, respectively connected after the convolutional layer of part a pond layer and excitation layer；The volume Two The output end of product unit is connected the N4 warp lamination, and last warp lamination is for exporting binaryzation picture, in picture It is worth and represents character area for 1 pixel, the pixel being worth for 0 represents non-legible region, wherein, N3 and N4 value cause institute State model structure and reach highest accuracy rate on training set.
6. indoor mall shop according to claim 1 feature extracting method, it is characterised in that：In step S6, according to such as Shop decorated style identification probability is calculated in minor function：s_s=f_s(a), wherein, a represents the decorated style identification nerve net The output of network, s_sBelong to the probability in each shop, f for the picture_sFor sigmoid functions.
7. indoor mall shop according to claim 1 feature extracting method, it is characterised in that：In step S6, according to such as Text region probability is calculated in minor function：s_t(j)=∑_kG_N(k) I (N (k) ∈ S (j)), wherein, S (j) is represented in M shop Character string corresponding to the trade name in j-th of shop, k represent the bits of coded in NGRAM coding schemes, are 0 ..., 9999, N (k) is represented Character string corresponding to k-th of bits of coded in NGRAM codings, N (k) ∈ S (j) represent that character string N (k) belongs to S (j), and I is the property shown letter Number, G_N(k) the NGRAM codings of word recognized in input picture are represented.
8. indoor mall shop according to claim 1 feature extracting method, it is characterised in that：In step S6, according to such as Final identification probability is calculated in lower formula：Y=(1- α) s_t+αs_s, wherein, s_tRepresent Text region probability, s_sRepresent decoration Style identification probability；α represents the weights of decorated style identification probability.
9. indoor mall shop according to claim 8 feature extracting method, it is characterised in that：α value is to pass through intersection Confirmatory experiment takes the model structure for making the decorated style identify neutral net recognition accuracy highest value in training set.
10. indoor mall shop according to claim 1 feature extracting method, it is characterised in that：It is further comprising the steps of： S7, the shop recognition result obtained according to step S6, with reference to market map Search and Orientation future position, provide positioning result.