CN107437100A

CN107437100A - A kind of picture position Forecasting Methodology based on the association study of cross-module state

Info

Publication number: CN107437100A
Application number: CN201710670153.0A
Authority: CN
Inventors: 丰江帆; 孙文正; 夏英; 张智慧
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Chongqing University of Post and Telecommunications
Priority date: 2017-08-08
Filing date: 2017-08-08
Publication date: 2017-12-05

Abstract

The invention discloses a kind of picture position Forecasting Methodology based on the association study of cross-module state, this method includes：The data set with detailed geographical labels is obtained from network, obtained data set is pre-processed by using associated picture preprocess method, to the significant target in prominent image, and reduces the influence of unrelated noise in image.Convolutional neural networks model is built, enables the correlation space feature in more sensitive extraction image, and repeatedly network is trained and verified, network is repeatedly adjusted according to the result, enables have higher predictablity rate.Position keyword extraction is carried out to the relevant textual information of image.Text prior probability model is added in the output layer of convolutional neural networks, to be associated prediction with reference to text keyword, improves the accuracy rate of prediction.Present invention can apply to carrying out the position prediction based on association study without the extensive social media image data that GPS is marked.

Description

A kind of picture position Forecasting Methodology based on the association study of cross-module state

Technical field

The invention belongs to deep learning and field of image recognition.More particularly to a kind of picture position based on the study of cross-module state Forecasting Methodology.

Background technology

Position prediction：Relevant textual information in various space characteristics and its relevance, and image i.e. in image Position prediction is carried out to the image without GPS information.It is a kind of automatically picture position Forecasting Methodology.It is in digital map navigation, image Searching field has critically important practical significance.

In recent years, deep learning starts the extensive concern by academia, has become artificial intelligence field to today One of most important research method, its method help researcher all to achieve great dash forward in many visual identity fields Broken, especially in field of image recognition, in the I contests of 2012, the Alexnet models based on deep learning obtained first Name, and the method that its accuracy rate uses other algorithms far above other, into the ILSVRC matches of 2016, based on convolution god The error rate of algorithm top5 through network only has 9%, and this sufficiently illustrates that deep learning algorithm is excellent in picture recognition classification Gesture, and convolutional neural networks are during target identification is carried out, it is constant well with having to geometric transformation, deformation, illumination Property, so using the image classification algorithms based on convolutional neural networks position prediction can be carried out to image.

The method that picture position matching is carried out using conventional machines learning method has used the local features such as SIFT, HOG to examine Survey method, and improved on this basis.Although these methods have preferable performance on feature invariance, it is limited In the characteristic present ability selected by it, and design comparison is complicated, and the generalization ability of feature set is also poor, it is also necessary to a ratio Appropriate grader, which is coordinated, can just relatively good effect.Application attestation in multi-field in recent years, deep learning Algorithm is better than traditional machine learning method in image classification and identification field, but text label is combined in some classical ways and is believed Assistant images location matches are ceased, and carry out combining the methods of judging with local feature being that can use for reference depth with reference to global characteristics There is more preferable accuracy rate in degree learning algorithm.

In addition existing many image datas on the internet are not isolated pictures or text message, such as micro- Rich, mhkc etc., always having been attached in view data needs text message, more smart to information progress in order to combine media mode Accurate judgement, so needing handling information for cross-module state.The present invention by related text carry out keyword extraction, and In the way of neutral net being added in a manner of prior probability, the accuracy rate to position prediction of raising.

The content of the invention

The purpose of the present invention is for problem present in above-mentioned technology, there is provided a kind of image based on the association study of cross-module state Position prediction algorithm, using deep learning technology and merge related text keyword message, improve the accuracy rate of prediction.

The technical scheme of present invention invention use is to achieve these goals：A kind of figure based on the association study of cross-module state Image position Forecasting Methodology, comprises the following steps：

(1) data set obtains, and by using web crawlers technology, (such as microblogging) obtains relevant bits from network social intercourse media Put the image and text message of mark；

(2) image preprocessing, to the relevant position mark image pre-process, and by the image handled well according to It is divided into training set, checking collection and test set；Training set, checking collection and test set are according to 8：1：1 model split.

(3) convolutional neural networks are built, network is trained using the training set, are carried out using checking set pair network Checking, and network is adjusted according to the result of checking, obtain one and accurate network is predicted to image location information Model；

(4) position keyword extraction is carried out to the text message, prior probability is added in the output layer of network model Model, keyword priori is incorporated in the network model trained with prior probability model；

(5) network model obtained using test set to step (4) carries out accuracy rate test, exports prediction result.The mistake Model can calculate most probable 5 prediction probabilities of the image in journey, and probability highest one is prediction result, is the figure The captured position of piece.

Above step includes in (2) to image preprocessing, is processed the image into and met using the form for cutting or/and scaling The input requirements of network model, and by overturn, rotate or/and translate expand image information, then by saliency at Reason method carries out saliency processing to image, protrudes the conspicuousness target in image.

Specifically, the convolutional neural networks include one layer of input layer, one layer of output layer, if containing between input and output layer Dried layer convolutional layer and pond layer, convolutional layer are used to carry out feature extraction to the spatial information of image, and every layer of convolutional layer includes size The convolution kernel to differ several, to extract different characteristic；The result that every layer of convolution obtains is entered using nonlinear activation primitive Row goes to linearize；Pond layer improves training speed using maximum pond while carrying out down-sampled to feature.

The training set is trained and verified that set pair network carries out checking and included to network, every 1000 times in training process Iteration carries out an accuracy rate using checking collection and verified, and current network parameter is preserved.

Advantages of the present invention and have the beneficial effect that

The present invention is the thought based on deep learning, imitates the mankind for the deterministic process of location matches, uses depth The identification for imitating the mankind to key object in image is practised, is worked as using the position keyword merged in deep learning in associated text The mode of priori is done, imitates the process that the mankind say that geographical position matches to combination text with picture to image.The present invention makes The space characteristics in picture are automatically extracted with convolutional neural networks, manual feature in original feature extracting method is breached and selects Limitation to predictablity rate, using Image Pretreatment Algorithm, reducing irrelevant factor in image influences, auxiliary using text keyword Prediction is helped, these methods, all to a certain extent, improves the degree of accuracy of prediction.

Brief description of the drawings

Fig. 1 is a kind of flow chart of the picture position Forecasting Methodology based on the association study of cross-module state of the present invention；

Fig. 2 is image preprocessing flow chart of the present invention.

Embodiment

Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, detailed Carefully describe.Described embodiment is only a part of case study on implementation of the present invention.

The present invention builds convolutional neural networks towards without GPS information image using the thought of deep learning.Use network society Media data structure data set is handed over, so as to carry out the training of network.Image space is carried out using the convolutional neural networks trained Feature extraction, and tentative prediction is carried out according to the feature of extraction.Position keyword dictionary is established, line position is entered to image related text Keyword extraction is put, using bayesian probability model, text keyword is incorporated in the network model trained, is obtained final Prediction result.

Fig. 1 of the present invention shows the flow chart of the method that image geographical marking is realized based on deep learning of the present invention, Comprise the following steps that：

(1) web crawlers technology is used, crawls the image and text message of the social media such as microblogging relevant position mark, and Classified and marked, to image using rotating, translation, deformation, data dilatation is carried out to data set the methods of cutting.And use Saliency processing method, conspicuousness processing is carried out to image, to reduce the noise effect in image.And according to 8：1：1 Into being monitored in real time to physical training condition in training process, test set is used for last accuracy rate to be tested model split.

(2) convolutional neural networks are built, to improve predictablity rate, after the present invention is using fine-tune adjustment The Alex network models of place365 data set pre-training, 6,000,000 images of the Web vector graphic have carried out pre-training, have used this Pre-training model can accelerate the retraining speed of network, and improve precision of prediction, and network includes one layer of input layer, one layer Softmax output layers, if containing dried layer convolutional layer and pond layer between input and output layer, convolutional layer is used to believe the space of image Breath carries out feature extraction, every layer of convolutional layer include convolution kernel not of uniform size several, to extract different characteristic.Every layer of convolution Obtained result using nonlinear activation primitive linearize.Pond layer is carried out drop to feature and adopted using maximum pond Training speed is improved while sample.Regularization method is used in loss function, to reduce over-fitting.

In the present embodiment, an input layer, 5 convolutional layers, wherein there are three convolutional layers to be followed by most are included in model Great Chiization layer, finally there are three full articulamentums.Output in network to each layer of convolutional layer is used to go using Relu activation primitives Linearisation, Relu are a kind of activation primitives, and its functional expression is f (x)=max (x, 0), and x represents the input value of activation primitive, i.e. net The value of each point of characteristic pattern of each layer after calculating in network.Relu activation primitives have meter compared to other activation primitives The advantages that calculation amount is small, fast convergence rate.Average pond is replaced using maximum pondization, the blurring effect in average pond is avoided, carries The richness of feature is risen, using Dropout technologies to reduce over-fitting in full articulamentum, Dropout methods can train During ignore a part of neuron, can significantly avoid over-fitting.Regard output layer using softmax normalized functions, Softmax functions are a normalized functions, and its effect is that the output result of network is normalized into probability distribution.Its functional expression For：y₁,y₂...y_nFor the result of calculation before output layer, n is how many output result altogether. Calculated by formula, obtain output new corresponding to one, new output meets being required for probability distribution.To softmax The result that layer output obtains, which uses, intersects entropy function as loss function, and cross entropy is a kind of two probability distribution distances of calculating Method, for two probability distribution p and q, the wherein correct probability of p (x) representative pictures, q (x) represents what network calculations obtained Probability distribution, its cross entropyWherein p (X=x) meetsAndOver-fitting is avoided using regularization method in loss function, regularization mode is exactly in loss function The index for portraying model complexity is added, so as to reach the purpose for avoiding over-fitting, it is assumed that loss function is J (θ), then is being instructed It is not direct optimization J (θ) when practicing, but it is a parameter to optimize J (θ)+λ R (w), λ, represents the complicated loss of model in total losses In ratio, θ represents all parameters in model, including weights and amount of bias.What R (w) was described is exactly the complexity journey of model Degree,Wherein w is the weights in network, and α is parameter.Value after calculating loss function makes With value of the stochastic gradient descent algorithm Reverse optimization per layer parameter.

(3) above-mentioned model is trained again using training set, and training parameter is set, due to the pre-training that uses Model, so iterations is arranged to 30,000 times, one-time authentication is carried out using checking collection 1000 times per iteration, when loss late declines When slowly, deconditioning, and preserve training network and relevant parameter.

(4) by training sample, keyword structure position keyword dictionary in relevant position is selected to be used to extract the data obtained The keyword of text data, extracted keyword is regarded into priori, a priori is built according to keyword frequency of occurrences Knowledge matrix, it will obtain Prior knowledge matrix using prior probability model and be fused to the mould trained that is obtained in step (3) In type, the joint for carrying out textual image judges, to lift the accuracy rate of prediction.Specific fusion steps are as follows：First to priori Knowledge matrix carries out the parameter ε processing that basis is specified, and ε reservation initial value is more than in matrix, zero is taken less than ε, to obtaining Prior knowledge matrix with the matrix that deep learning correspondence position is multiplied to obtain be final output matrix.In output matrix most Big value is the prediction of maximum probability, as prediction result.

(5) use in the network that test set input step (4) obtains, due to testing, institute little with training set degree of correlation The predictablity rate of the network can correctly be reflected with it.Each test case can obtain 5 prediction probabilities, probability highest one Individual is prediction result, is the captured position of the picture.

Fig. 2 shows the step of image is pre-processed, specific as follows

(1) input picture is zoomed in and out and cut first, due to the image gathered from the network social intercourse media such as microblogging Data resolution differs, and need to zoom in and out and be cut to 256*256 resolution ratio to it, to meet the requirement of the size of network inputs

(2) operation such as spun upside down or rotated to image, on the one hand can so expand the quantity of training set, solution On the other hand the problem of training set data deficiency, can improve the generalization ability of network.

(3) saliency processing method, using conspicuousness image processing method, conspicuousness processing, mesh are carried out to image Be to make to need the conspicuousness target identified in image, such as landmark, high building is more prominent in the picture, and ignores image The influence of the irrelevant noise such as middle pedestrian, trees, sky, so does the accuracy rate that can improve prediction.

Claims

1. a kind of picture position Forecasting Methodology based on the association study of cross-module state, comprises the following steps：

(1) data set is obtained, and by using web crawlers technology, the image that relevant position identifies is obtained from network social intercourse media And text message；

(2) image preprocessing, the image of relevant position mark is pre-processed, and by the image handled well according to segmentation Into training set, checking collection and test set；

(3) convolutional neural networks are built, network is trained using the training set, are tested using checking set pair network Card, and network is adjusted according to the result of checking, obtain network model；

(4) position keyword extraction is carried out to the text message, prior probability model is added in the output layer of network model, Keyword priori is incorporated in the network model trained with prior probability model；

(5) network model obtained using test set to step (4) carries out accuracy rate test, exports prediction result.

A kind of 2. picture position Forecasting Methodology based on the association study of cross-module state according to claim 1, it is characterised in that：Institute State includes in step (2) to image preprocessing, is processed the image into using the form for cutting or/and scaling and meets network model Input requirements, and expand image information by overturning, rotating or/and translate, then by saliency processing method to figure As carrying out saliency processing, the conspicuousness target in image is protruded.

3. a kind of picture position Forecasting Methodology based on the association study of cross-module state according to claim 1 or claim 2, its feature exist In：The training set, checking collection and test set are according to 8：1：1 model split.

A kind of 4. picture position Forecasting Methodology based on the association study of cross-module state according to claim 1, it is characterised in that：Institute Stating convolutional neural networks includes one layer of input layer, one layer of output layer, if containing dried layer convolutional layer and pond between input and output layer Layer, convolutional layer are used to carry out the spatial information of image feature extraction, and it is some that every layer of convolutional layer includes convolution kernel not of uniform size It is individual, to extract different characteristic；The result that every layer of convolution obtains using nonlinear activation primitive linearize；Pond layer Using maximum pond, training speed is improved while carrying out down-sampled to feature.

A kind of 5. picture position Forecasting Methodology based on the association study of cross-module state according to claim 4, it is characterised in that：Institute State training set and be trained and verify that set pair network carries out checking and included to network, in training process every 1000 iteration uses test Card collection carries out an accuracy rate checking, and current network parameter is preserved.

6. a kind of 1 picture position Forecasting Methodology that study is associated based on cross-module state is wanted according to right, it is characterised in that：Step (5) during the accuracy rate test, in the network model that test set data input to step (4) is obtained, model can calculate this Most probable 5 prediction probabilities of image, probability highest one are prediction result, are the captured position of the picture.