CN113159122B

CN113159122B - City geomorphology analysis method based on social media image data

Info

Publication number: CN113159122B
Application number: CN202110280016.2A
Authority: CN
Inventors: 李朋龙; 丁忆; 罗莉; 马泽忠; 肖禾; 赵玲; 李海峰; 韦宏林; 罗鼎; 段松江; 魏文杰; 钱进; 程丽丹; 胡艳; 殷明
Original assignee: Chongqing Geographic Information And Remote Sensing Application Center; Central South University
Current assignee: Chongqing Geographic Information And Remote Sensing Application Center; Central South University
Priority date: 2021-03-16
Filing date: 2021-03-16
Publication date: 2022-03-15
Anticipated expiration: 2041-03-16
Also published as: CN113159122A

Abstract

The invention discloses a city geomorphology analysis method based on social media image data, which comprises the following steps: acquiring social media image data; classifying and de-noising samples of the data set; performing geomorphic learning by using a convolutional neural network; processing the problem of sample unbalance; performing transfer learning; clustering the features of the features; calculating the geomorphic distance between cities; and analyzing the similarity of the appearances among cities and the fine-grained appearances of the cities. The invention encodes the city landscape into a group of vectors; the problem of unbalanced distribution of social media images is solved; the city feature distance is defined by using the city feature vector, how different cities are similar in appearance and similar in feature can be analyzed by using the distance, and the correlation between the feature distance and the geographic distance can be analyzed; by using a clustering method with the city feature vector as the embedded vector, the fine-grained city features can be found more finely.

Description

City geomorphology analysis method based on social media image data

Technical Field

The invention belongs to the field of artificial intelligence, and particularly relates to a city geomorphology analysis method based on social media image data.

Background

The city style refers to the feature that a city can be perceived visually, and is an important sign of city identity, regional culture and city charm and vitality. It is influenced by both physical environment and non-physical environment, including landscape elements such as landscape environment, open space, architectural shape, etc. Under the influence of globalization, many cities gradually lose features, and the problem of geomorphology convergence on the surface of the thousand cities appears. In the last decade, city feature construction and city geomorphology planning have been paid much attention, but as cultural features are difficult to measure, the geomorphology uniqueness or similarity among cities is difficult to judge, and scientific quantitative methods and objective analysis techniques suitable for city geomorphology construction still need to be perfected. There are no two identical leaves in the world, nor are cities identical or different. The visual characteristics and the physiognomic characteristics of the city are extracted, so that the visual characteristics and the physiognomic characteristics of the city can be used as the basis for city design and management, and can be used as the reference for tourists during traveling.

Many researchers have explored the uniqueness of cities from the aspects of visual elements, element types and the like, and some have explored the similarity and uniqueness of cities from the aspect of feature maps. Describing and representing the feature of cities is helpful for measuring the unique or similar degree among the cities, but earlier researches such as patent documents 1 to 3 mainly rely on questionnaires and interviews to explore the feature of cities, which provide good ideas for future researches, but the method cannot quickly capture key visual information and is very time-consuming and labor-consuming. The image data is an important information medium for objectively reflecting the urban landscape, and is a way for observing the local landscape of the city from the daily life perspective. With the rapid development of various social software (such as Flickr, microblog, Instagram), network map services (such as Google, Tencent) and the like, the acquisition of image data becomes easier, and the image data covering all corners of a city is increasing. With the change of computer technology, many researchers are paying attention to the mining and utilization of urban image data, for example, non-patent document 4 explores the feature elements which can embody the urban speciality of paris most, and knows what balconies or windows look like paris most; non-patent document 5 seeks what makes london look very beautiful, quiet, and happy. However, due to the task requirement that the prior art cannot process massive image data, a more efficient and intelligent processing method is urgently needed in society. In recent years, with the rapid development of deep learning in the field of computer vision, in non-patent documents 6 to 9, a convolutional neural network having a strong learning ability and an expression ability has made a breakthrough in tasks such as image classification and image scene recognition. In the aspect of urban image research, non-patent documents 10 to 12 mainly relate to relevant research such as urban perception, urban and architectural styles, and location recognition and urban similarity analysis. In non-patent document 13, Zhang et al establishes a social media data set covering 18 cities, measures similarity of urban scenes and objects by using a deep convolutional neural network image classification method, and finds uniqueness of the whole city, but they do not research fine-grained style forming the whole characteristics of the city.

Many studies have made a comprehensive understanding of cities using both types of data. Non-patent documents 14,15 use network image data for travel analysis. Non-patent document 10 mainly uses street view images for prediction analysis of cities, city safety analysis, and the like. In non-patent document 16, zhou bo li et al analyzed city element types of 7 cities using network data, and studied similarities and differences between cities. Also, non-patent document 17 evaluates image characteristics of different cities from the aspects of the distribution structure and uniqueness of the entire city by using network image data. In non-patent document 18, Kita predicts a risk of a car accident using a house image of Google street view images, and proposes a risk prediction model. In non-patent document 19, saleses et al analyzed the street safety of 4 cities using Google street view images. However, there has been no study of urban landscapes analysis using network image data.

In non-patent document 20, Matzen et al creates and labels a large-scale clothing data set, creates 12 clothing attributes for the data set, finds various combinations of appearance types of clothing in a clustering manner, and performs comparative analysis on the clothing in the north-south hemisphere. In non-patent document 14, Miriam et al analyzed the cultural landscape of photographs taken using an object detection method and an aesthetic calculation tool, and quantitatively analyzed the degree of similarity of photographs taken using a supervised classification method. Non-patent document 21 discloses a visual consistency method of mining a completely consistent region in an art work by cosine similarity.

In the prior art, a method for learning city appearances from massive social media images and analyzing the similarity of the appearances among cities and the fine-grained appearances of the cities does not exist.

Prior art documents

Non-patent document

1.Milgram,S.,A psychological map of New York City.American Scientist,1972.60(2):p.194-200.

2.Twigger-Ross,C.L.and D.L.Uzzell,Place and identity processes.Journal of environmental psychology,1996.16(3):p.205-220.

3.Paasi,A.,Region and place:regional identity in question.Progress in human geography,2003.27(4):p.475-485.

4.Doersch,C.,et al.,What makes paris look like parisACMTransactions on Graphics,Association for Computing Machinery,2012.31(4).

5.Quercia,D.,N.K.O'Hare,and H.Cramer.Aesthetic capital:what makes London look beautiful,quiet,and happyin Proceedings of the 17th ACM conference on Computer supported cooperative work&social computing.2014.

6.Sun,Y.,et al.,Automatically designing CNN architectures using the genetic algorithm for image classification.IEEE transactions on cybernetics,2020.50(9):p.3840-3854.

7.Ma,B.,et al.,Autonomous deep learning:A genetic DCNN designer for image classification.Neurocomputing,2020.379:p.152-161.

8.B.Zhou,A.L.,A.Khosla,A.Oliva and A.Torralba,Places:A 10Million Image Database for Scene Recognition.in IEEE Transactions on Pattern Analysis and Machine Intelligence,1 June 2018.40,no.6:p.pp.1452-1464.

9.Tang,P.,H.Wang,and S.Kwong,G-MS2F:GoogLeNet based multi-stage feature fusion of deep CNN for scene recognition.Neurocomputing,2017.225:p.188-197.

10.Zhang,F.,et al.,Social sensing from street-level imagery:A case study in learning spatio-temporal urban mobility patterns.ISPRS Journal of Photogrammetry and Remote Sensing,2019.153:p.48-58.

11.Obeso,A.M.,et al.Connoisseur:classification of styles of Mexican architectural heritage with deep learning and visual attention prediction.in Proceedings of the 15th international workshop on content-based multimedia indexing.2017.

12.Lee,S.,et al.Linking past to present:Discovering style in two centuries of architecture.in IEEE International Conference on Computational Photography.2015.

13.Zhang F,Z.B.,Ratti C,Liu Y,Discovering place-informative scenes and objects using social media photos.Royal Society open science,2019.6(3),181375.

14. The application of vehicle seismograph, city image element in city tourism planning, takes the city of Jinghong city of Xishuangbanna as an example social scientist, 2010(6): p.99-101.

15.Ning,D.,PMMS:A Photo based Metadata Mining System for Tourism Research.Tourism and Hospitality Prospects,2017.1(6):p.34-47.

16.Zhou,B.,et al.Recognizing city identity via attribute analysis of geo-tagged images.in European conference on computer vision.2014.Springer.

17.Yuehao,C.,L.Ying,and Y.Peifeng,City Image Study Based On Online Pictures:24Cities Case.Planners,2017(2):p.10.

18.Kinga Kita,L.K.n.,Google street view image of a house predicts car accident risk of its resident.arXiv preprint arXiv:1904.05270,2019.

19.Salesses,P.,K.Schechtner,and C.A.Hidalgo,The collaborative image of the city:mapping the inequality of urban perception.PloS one,2013.8(7):p.e68400.

20.Matzen,K.,K.Bala,and N.Snavely,Streetstyle:Exploring world-wide clothing styles from millions of photos.arXiv preprintarXiv:1706.01869,2017.

21.Xi Shen,A.A.E.,Mathieu Aubry,Discovering Visual Patterns in Art Collections with Spatially-consistent Feature Learning.Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR),2019:p.pp.9278-9287.

Disclosure of Invention

The invention aims to learn the feature of cities from massive image data to measure the visual difference of the cities. The visual difference is mainly influenced by image data, the image has the property of 'photography culture', and the property is also the reason for making the urban features similar, therefore, the invention considers the property of 'photography culture' of the introduced image, which is not considered in the previous research. The invention introduces a 'feature'. From the aspect of the feature map, the lighter the feature map records information such as color and texture, and the deeper the feature map obtained, the higher the level of information. In order to be able to obtain more useful information, the present invention uses deep profile statistics. To achieve the above idea, the present invention uses the convolutional neural network ResNet-50, and calculates the mean and variance of the feature map of the network layer4 to form the feature vector of the present invention.

In order to quantitatively describe the difference of the features among cities, the similarity and the uniqueness of the overall features of the cities are measured by the city feature distance calculated by the feature vector. In addition, because cities in different times and different areas have different features, the invention aims to analyze more detailed features of the cities, thereby providing a clustering analysis approach of fine-grained features of the cities, which forms the overall characteristics.

Based on the above purpose, the invention provides a city feature analysis method based on social media image data, which comprises the following steps:

acquiring social media image data and establishing a data set;

carrying out indoor and outdoor image classification on samples of the data set, eliminating indoor image samples, carrying out noise removal processing on outdoor image samples, and removing samples which do not represent city features;

performing geomorphic learning on the processed data set by using a convolutional neural network;

processing the sample unbalance problem by adopting a focus loss function;

the pre-training model is used for transfer learning, and is used for solving the problems of overfitting and long training time;

clustering the feature by using a minimum information quantity criterion;

calculating the geomorphic distance between cities;

and analyzing the similarity of the features among the cities according to the feature distance, and analyzing the fine-grained features of the cities according to the clustering result.

Furthermore, an indoor and outdoor classification model resNet18 is adopted in the method for performing indoor and outdoor image classification on the samples of the data set, and the method for performing noise removal processing on the outdoor image samples is an unsupervised clustering method.

Further, the method for learning the appearance is as follows:

training a convolutional neural network which takes city image data as input and takes the probability value p of dividing the image into various cities as output; randomly extracting B images from the data set and inputting the B images into the convolutional neural network, wherein B is a batch size, and obtaining N through the l layer of the convolutional neural network_lSpreading a feature map, wherein the size of each feature map is W x H, and flattening each feature map into a one-dimensional vector S ═ a₁，a₂，......，a_W*H)^TTo obtain N_lVector of dimensions

And separately calculate S_iMean of_iAnd standard deviation Std_iAll Mean means are then taken_iSum variance Std_iAnd (3) forming a physiognomic vector:

and inputting the feature vector as a feature into a full-connection layer for learning.

Further, the focus loss function is as follows:

wherein alpha is_iFor the weight of the ith city, the smaller the weight with the larger sample size, α_i＝Num_min/Num_i，Num_minIs the minimum number of samples, Num_iIs the number of samples in the ith city, N is the total number of cities, and γ is the focal factor.

Further, a pre-training model is established by using transfer learning, and the pre-training model adopts an ImageNet pre-training model.

Further, by clustering the Style of the feature of the image, the fine-grained Style of the city is revealed.

Further, it is characterized in that the clustering method comprises the steps of:

PCA dimension reduction processing is carried out on the feature of the city, and the redundancy of data is eliminated;

and taking the vector after dimensionality reduction as an input of a Gaussian mixture model, and giving component numbers to the components to obtain component clustering results.

Further, the method for calculating the feature distance is characterized by comprising the following steps:

measuring the Style vector after full-connection operation, and generating a Style vector to generate a similarity matrix;

if the city _ i and the city _ j are similar in appearance, the image of the city _ i is easily judged to belong to the city _ j, and the number of samples of the city _ i judged to belong to the city _ j is recorded as S (i _ j);

the image samples of the city _ j are also easily predicted to belong to the city _ i, the number of the samples of the city _ j which are judged to belong to the city _ i is recorded as S (j _ i), and a confusion matrix for classification prediction is obtained;

after normalization processing is carried out on the confusion matrix, the similarity of the city _ i and the city _ j is T (j, i) ═ C_j，i+C_i，jIn which C is_j，i，C_i，jThe normalized values of S (j _ i) and S (i _ j) are respectively.

Compared with the prior art, the invention has the following advantages and beneficial effects:

1) a method for identifying city features from a mass of social media images is proposed, and the city features are encoded into a set of vectors. The invention introduces the Focal loss function, solves the problem of unbalanced distribution of social media images, and enables the network to better learn the feature of the city;

2) the city feature distance is defined by using the city feature vector, how different cities are similar in appearance and similar in feature can be analyzed by using the distance, and meanwhile, the relationship and heterogeneity between the feature distance and the geographic distance can be analyzed;

3) in order to deeply understand the feature of a single city, a clustering method which takes a city feature vector as an embedded vector is used, and the method can find the fine-grained feature of the city more carefully.

Drawings

FIG. 1 is a block diagram of an overall method for analyzing urban landscapes according to the present invention;

FIG. 2 is a flow chart of the city feature analysis method of the present invention;

FIG. 3 is a graph of total amount of samples from various cities according to the present invention;

FIG. 4 is a classification result normalized confusion matrix of the present invention;

FIG. 5 is a sample of the unique visual appearance of each city of the present invention;

FIG. 6 is a normalized feature similarity matrix of the present invention;

FIG. 7 is a sample of the first 5 pairs of cities for which the feature distance is greatest according to the present invention;

FIG. 8 is a graph of city visual similarity versus geographic location of a city in accordance with the present invention;

FIG. 9 is a sample display diagram of the fine-grained classification result of Beijing according to the present invention.

Detailed Description

The invention is further described with reference to the accompanying drawings, but the invention is not limited in any way, and any alterations or substitutions based on the teaching of the invention are within the scope of the invention.

Two approaches are mainly used for obtaining the city image, (1) a social media platform (such as Facebook, microblog, Flickr and Twitter); (2) map service platform (e.g., Google, Baidu). Image data acquired from a social media platform is called network image data, the data is shot and uploaded by users, shooting angles are not uniform, shot contents are various, and the image data is a whole cognition of people to a city. For the data acquired from the map service platform, the data is generally called "street view image data", the shooting angles are uniform, the image sampling distribution is uniform, and the shooting contents are generally determined according to the research target.

According to the method, the social media image data are used for urban landscape analysis, street view images are compared with the social media images, and the two types of data have certain difference and similarity. The invention compares the network image data with the street view image data. As shown in table 1, the network image and the street view image can record each corner of the city, but the social network image data has a bias towards the distinctive region of the city, has a certain advantage in the study of the city features, and can better discover the scene with historical culture interest in the city. The network photo is the cognition and record of the user to the city, and the user likes to record the special places in the city and the places in which the user is interested, so the photo data acquired by the invention has a bias in content, but because of the bias, the special places in the city can be found more easily, and the city style can be analyzed. The present invention chooses to use social networking image data for its research and the nature of the data.

TABLE 1 comparison of network image data with street view image data

The invention collects 534,767 pieces of social media image data of 10 cities, discovers the feature of the cities by using convolutional neural network learning, measures the similarity between the cities by using the feature and defines the feature distance. The invention does not set specific city feature recognition standards, but finds the city feature types in an unsupervised mode.

The overall framework of the city morphology analysis method of the invention is shown in figure 1. In order to hopefully learn the feature of the city and use the feature for various analysis later, a convolutional neural network is used for automatically learning the rich internal feature levels in given training data, wherein the mean value and the variance of the feature map of the layer4 of the network are counted by the invention, and the vector formed after the connection of the mean value and the variance is input into a full connection layer as the feature.

Example 1

The invention provides a city geomorphology analysis method based on social media image data, and fine-grained geomorphology of a single city is found by utilizing the method. As shown in fig. 2, the city morphology analysis method of the present invention includes the following steps:

s10: obtaining social media image data and establishing a data set

The research data of the invention is YFCC-100M (Yahoo Flickr Creative common 100Million) data set, the data set comprises all video and photo information uploaded by users in 2004-2014, the data set comprises 23 information such as download link, upload time, shooting place, user label and machine label, longitude and latitude, 1 hundred Million pieces of data are total, and 80 Million pieces of data are video information. By using the existing information, the invention acquires images of 10 cities located in 4 continents and user tag information: asia, europe, north america, oceania. These cities are: 4,387,980 pieces of image information are obtained from Beijing, Shanghai, hong Kong, Tokyo, Toronto, New York, Montreal, Paris, London, and Sydney.

S20: indoor and outdoor image classification of data set samples and noise sample removal

The acquired data may have a large number of noise samples and has been proven. In order to avoid that these noise samples affect the experimental results, they need to be removed. The following is the process of removing noise samples according to the present invention, which is divided into two steps.

S201: automatic classification and elimination of indoor and outdoor images

Various images such as persons, indoor tables and chairs, foods, animals, and buildings are acquired according to the characteristics of the network image data. According to the research object of the invention, urban images with characteristic scenes need to be found, the scenes are outdoor scenes firstly, and therefore, the invention firstly considers that indoor images are eliminated. Among them, the present invention refers to images that do not have a representation of urban features, such as indoors, people, flowers, and plants, and food, as "noise samples".

In order to remove these irrelevant indoor samples, the invention uses an indoor and outdoor classification model resNet18 trained by a Place365 data set to process 10 cities respectively, each city obtains two classification results of indoor and outdoor, and finally each city retains an image with the category of outdoor, and finally 750 images and 850 images are remained.

S202: noise processing

In the step S201, there is a certain error in the two classification models trained, and there may also exist a certain proportion of noise samples in the outdoor images obtained by classification, and these noise samples do not have urban landscape representativeness, such as flowers, plants, animals, airplanes, sky, etc. in the experimental process of the present invention, it is found that these remaining noise samples have a certain influence on the experimental result of the present invention, so that these noise samples need to be further removed. In the prior art, the HOG characteristics of a noise sample are extracted, then an SVM classifier trained in advance is used for classifying the noise sample, in order to improve the classification precision, one-to-many SVM classifiers are trained for various types of noise respectively, if the noise is judged, the noise is removed, and if the noise is not judged, the noise is retained. This method requires candidate noise samples, but in the present invention, it is not known how much kind of noise is. The characteristic difference of the noise sample and the non-noise sample is large, so the noise is eliminated in an unsupervised clustering mode, and experiments prove that the noise can be well classified.

The method comprises the following steps:

s2021: training a network (ResNet50) with the city name as the category as a feature extractor, wherein 20% of samples are randomly selected from each city as training samples, and the training method is the prior art and is not repeated.

S2022: the invention uses the characteristics of the pooling layer as the input characteristics of the clustering, the clustering number of each city is set to be 30 (according to the experimental condition), and finally, the clustering result is checked to find that most noise and non-noise samples are well distinguished and clustered into respective categories, thereby 534,767 images are obtained, and the specific number distribution is shown in table 2.

TABLE 2 List of the number of each city image

S30: geomorphologic learning using convolutional neural networks on processed data sets

The key point of city feature learning is the extraction of city image features. In the invention, a convolutional neural network ResNet-50 which takes city image data as input and the probability value p of dividing the image into various cities as output is trained, and the trained network is taken as a feature extractor of the city image.

The process of the geomorphic learning is as follows:

b images are randomly extracted from the data set and input into the network (where B is the batch size), and N is obtained through the l-th layer of the network (the present invention sets l to 4)_lZhang feature map (in the invention N)_l2048), each feature map is of size W × H, each feature map is flattened into a one-dimensional vector S (a)₁，a₂，......，a_W*H)^TThereby obtaining N_lVector of dimensions

And separately calculate S_iMean of_iAnd standard deviation Std_iThe vector composed of all the mean and variance contacts is called a feature vector:

s40: processing sample imbalance problems with focus loss function

As shown in FIG. 3, the experimental data set of the present invention has a sample imbalance problem, and the present invention uses a Focal loss (Focal loss) function to solve the problems of extreme sample imbalance and difficult sample classification between the foreground and the background.

The focus loss function is expressed as follows:

wherein alpha is_iFor the weight of the ith city, the smaller the weight with the larger sample size, α_i＝Num_min/Num_i，Num_minIs the minimum number of samples, Num_iIs the number of samples in the ith city, N is the total number of cities, and γ is the focal factor. Setting the appropriate value of α accounts for sample imbalance through a number of experiments.

S50: transfer learning using pre-trained models

The large amount of image data enables the model to have enough training samples in training, the convolutional neural network can learn enough features, and the problems of overfitting and overlong training time exist at the same time. To address these issues, the present invention introduces transfer learning. The transfer learning helps to solve the existing problems by utilizing the existing knowledge, and can improve the robustness of the model besides solving the two problems.

The ImageNet pre-training model contains the characteristics of 1000 types of image data and is a better choice for the transfer learning of the invention. The method fixes the training weight of the ImageNet pre-training model base layer and finely adjusts the weight of the full-connection layer.

S60: clustering features of a landscape

The fine-grained landscape of the city is further analyzed using the landscape vector. Many researches on the types of features generally specify the types of features, use the features as classification labels, and then find images belonging to the types by a machine learning method. However, the method does not set the feature type attribute for the image data, so the method considers the use of a clustering mode to find the feature type of the city.

By clustering the Style features of the images, the fine-grained Style of the city can be discovered and revealed. In order to eliminate the redundancy of data, the invention carries out PCA dimension reduction processing on the feature of the city, and reserves 90% of variance. Then, the present invention takes the vector after dimensionality reduction as an input of a GMM (gaussian mixture model), and gives component numbers to the components, so that components clustering results can be obtained.

To avoid the situation where random assignment of cluster numbers results in overfitting, the present invention uses criteria for determining the fraction provided by the sciki-lern library: minimization of information content criterion (AIC).

S70: calculating the geomorphic distance between cities

In order to quantitatively describe the similarity degree of the appearances among cities, the invention indirectly calculates the appearance distance among the cities by using the appearance vector. In general, the objects of the distance measure should be equal or similar and have the same dimensions. The object of the city similarity measurement is image data of a city, and the content contained in the image should be the same thing, such as a house or a bridge. However, the subject of the present invention is the urban landscape, which is not just a single thing. In the prior art, the method can accommodate various measurement objects by taking a model evaluation result (confusion matrix) as a measurement standard. Different from the prior art, the invention measures the Style vector after carrying out full-connection operation, namely, the Style vector is used for generating the similarity matrix.

The method has the following idea: if the city _ i is similar in appearance to the city _ j, the image of the city _ i is easily discriminated as belonging to the city _ j, and the number of samples of the city _ i discriminated as the city _ j is recorded as S (i _ j). Similarly, the image sample of the city _ j can be easily predicted to belong to the city _ i, the number of the city _ j samples which are judged to belong to the city _ i is recorded as S (j _ i), finally, a confusion matrix for classification prediction can be obtained, and after normalization processing is carried out on the confusion matrix, the similarity of the city _ i and the city _ j can be obtained as T (j, i) or C_j，i+C_i，j. Wherein C is_j，i，C_i，jThe normalized values of S (j _ i) and S (i _ j) are respectively.

S80: urban geomorphology analysis

The process of image classification by the present invention is as follows. Dividing data into a training set, a verification set and a test set according to a ratio of 6:2:2, wherein the verification set is mainly used for adjusting parameters in the model training process to determine which data areWhen the training is stopped. Setting the blocksize to 1024, the forward and backward propagation of CNN is applied to calculate the parametric gradient of the focus loss function. Because the original images are different in size, the image scale input into the network for training is 224 in size. The invention uses momentum momentums equal to 0.9, learning rate lr equal to 0.001, weight attenuation weight_decyThe parameters of the network are updated with a random gradient decrease of 10 e-4. The invention carries out 800 iterations and adjusts the learning rate by using a cosine annealing learning rate attenuation strategy. Finally, an average accuracy of 49.9% was achieved. The present invention makes predictions on a test set and demonstrates the prediction results using a confusion matrix as shown in fig. 4.

S801: city geomorphic similarity analysis

The value of the diagonal line in fig. 4 is the proportion of a single city image sample that is correctly predicted, and the larger the value, the more easily the city is distinguished, representing the unique visual appearance of the city; the off-diagonal values represent the sample proportion that is easily misclassified to other cities, representing the visual appearance similarity between the two cities. Specifically, for example, the number of correctly predicted samples in the test samples of Beijing accounts for 64% of the total number (diagonal of the first row and the first column), and the ratio of samples that Beijing can easily predict as other cities (off-diagonal of the first row) is: 0.07 (hong kong), 0.06 (london), 0.01 (montreal), 0.05 (new york), 0.04 (paris), 0.03 (shanghai), 0.02 (sydney), 0.06 (tokyo), 0.02 (toronto).

From the confusion matrix chart, beijing (0.64), sydney (0.52), hong kong (0.48), paris (0.48) and shanghai (0.48) are the 5 cities with the highest predicted correct probability, which shows that the cities are more visually similar to other cities and have smaller similarity with other cities than other cities. London (0.40), Montreal (0.38), New York (0.42), Tokyo (0.40), Toronto (0.42) are 5 cities that are prone to misprediction, are more easily confused and are relatively visually less distinctive than other cities.

In order to more clearly understand the visual appearance of each city, as shown in fig. 5, 10 cities are predicted to be correct and the prediction probability is higher. As can be seen from fig. 5, historical relics, landmarks, unique urban landscapes are scenes where these cities have urban map characteristics. Wherein, the historic ancient architectural landmarks such as the Imperial palace and the Temple of Beijing are scenes rich in visual uniqueness in Beijing; the scenes rich in urban information in hong Kong are unique scenes such as Victoria harbor night scenes and the like; landmarks such as london tower bridge, grand bell, etc. of london are factors for making london and other cities; landmarks such as the holy cathedral of montreal make montreal visually more unique; in addition, landmarks such as the bulwarkin bridge, empire building in new york, landmarks such as the eiffel tower and the kajmen in paris, landmarks such as the eastern pearl of shanghai, famous buildings such as the Yangtze coast in the Yangtze river, the Sydney opera house and the Sydney bridge in Sydney, landmarks such as the tokyo tower and the tokyo sky tree in Tokyo, historical buildings such as the shallow grass temple, and landmarks such as the Toronto tower in Toronto are elements which make the city have visual uniqueness, and are unique scenes which are not similar to other cities in vision.

In order to quantitatively describe the visual similarity between cities, the visual feature distance between every two cities is calculated by using the feature distance, so that a visual feature similarity matrix shown in FIG. 6 is obtained.

From the normalized wind-appearance similarity matrix, it can be analyzed that london is most similar to paris city (0.31), followed by new york-london (0.23), toronto-new york (0.22), tokyo-hong kong (0.20), tokyo-new york (0.20). As shown in fig. 7, a sample of the top 5 cities from the largest distance is shown.

From fig. 7, it can be seen that the above-mentioned scene with large visual similarity between two cities is easily misclassified, and the scene with visual similarity:

(1) london-paris (0.31): the buildings of the two cities are similar in style design, are regular and dome-shaped, and are similar in color.

(2) New york-london (0.23): gothic buildings and window styles are elements of new york and london that are susceptible to misclassification, making them visually similar.

(3) Toronto-new york (0.22): it can be seen from the figure that Toronto is similar to the modern buildings of high buildings in New York in visual sense, and the overall visual color is similar.

(4) Tokyo-hong kong (0.20): images with dense population and dense houses are reflected between Tokyo and hong Kong, and high-rise buildings are easily classified by mistake, which shows that both Tokyo and hong Kong have the characteristics of dense population and dense houses, the characteristics make the two cities similar in vision, and the high-rise buildings are one of the elements similar in vision.

(5) Tokyo-new york (0.20): the modern buildings are all glass curtain walls in appearance, and the individual designs of Tokyo's tower and New York are similar in color and are similar in visual sense due to the shooting angle, and the trade street of New York is similar in color tone to the common buildings of Tokyo.

Thus, it can be seen that city similarity can be reflected in terms of color, shooting perspective, and overall appearance.

S802: correlation analysis of geomorphic distance and geographic distance

In order to further explore the relation between the city feature similarity and the geographic position, the invention displays the feature distance on the spatial position. FIG. 8 is a diagram illustrating the similarity of the features of a city and the geographical location of the city. Connected by gray lines in the figure are a pair of cities with higher similarity scores, and the upper number is the sum of the misclassification rates between the cities, namely the value in the visual similarity matrix; the circles indicate the geographic locations of the cities, the sizes and the shades of the geographic locations of the cities are positively correlated with the correctly classified proportion of the images of the cities, namely, the sizes and the shades of the images are positively correlated with diagonal values in a confusion matrix, and the larger and darker circles indicate that the cities are lower in visual similarity with other cities and more visually unique.

As can be seen from fig. 8, there is usually a large visual similarity between cities with similar geographical locations, because there is usually deep communication between the cities with similar geographical locations in history, similar culture, similar climate and geographical environment, and the visual characteristics of the cities are influenced by the culture, history, climate and geographical environment to a large extent, so that the cities with more similar geographical locations are reasonably similar in visual.

S803: fine granularity geomorphology analysis of city

The types of appearances formed by each city are different due to differences in regions, cultural histories, and the like. The invention considers analyzing the style of a single city from a more detailed angle and obtains a fine-grained classification result of each city by using an unsupervised clustering mode.

Because there are many cities and the description is too complicated, according to the result of the city style similarity analysis in step S801, it is found that beijing has the most individual characteristics compared with other cities, so that the invention only analyzes beijing. The clustering result is obtained by using the method in step S60, and as shown in fig. 9, some samples having representative clustering centers near the clustering centers are selected. From the results of FIG. 9, the characteristics of Beijing are clearly understood, and for further explanation, the present invention will roughly classify the results into 5 categories: ancient architecture of Beijing (A), target object (B), modernized landmark architecture (C), some unique landscape (D), and Beijing at night (E).

The ancient architecture design of Beijing is very special, generally is bilateral symmetry, and the middle part is slightly high, mainly wants to embody the authority that the ancient emperor of China is high, and the color of wall is generally brick red. Meanwhile, the roof and the front of the ancient building are often accompanied by some things such as: the lucky pattern of dragon and lion can also be used for placing a plurality of incenses (the first three rows in fig. 9 (B)). In modern buildings, the big underpants of Beijing and some high-rise buildings attract people's attention due to the design and the function of the buildings, thereby forming one of the characteristics of Beijing (the first 5 lines of FIG. 9 (C)). The scenes of great wall, communication bridge, seventeen-hole bridge and the like become a bright scene of Beijing. The historical humanity and beautiful scenery can not necessarily leave a very deep impression. The closer to life, the deeper the sensation. The latter two rows of fig. 9(C) show the same urheen of old beijing, which can show the feelings of old people to old beijing and also can show the life of old beijing. The formation of the appearance of Beijing is closely related to the historical culture and development of China.

The invention provides a method for learning city features from massive social media images, and based on the method, the invention mainly analyzes 10 cities around the world from two aspects:

(1) the city appearance distance is defined by using the city appearance vector, the distance is used for analyzing how different cities are similar in appearance and appearance under the appearance visual angle,

(2) in order to deeply understand the feature of a single city, the invention uses the city feature vector as an embedded vector to carry out cluster analysis, thereby more finely discovering the fine-grained feature of the city.

The invention has the following beneficial effects:

1) the method for identifying the city feature from the mass social media images is provided, the city feature is coded into a group of vectors, and the Focal loss is introduced, so that the problem of unbalanced distribution of the social media images is solved, and a network can better learn the city feature;

2) defining city feature distances by using city feature vectors, analyzing how different cities are similar in appearance and similar in appearance by using the distances, and simultaneously finding that the feature distances and the geographic distances have certain correlation and heterogeneity;

The above embodiment is an implementation manner of the method for city feature analysis based on social media image data, but the implementation manner of the invention is not limited by the above embodiment, and any other changes, modifications, substitutions, combinations, and simplifications that do not depart from the spirit and principle of the invention should be regarded as equivalent replacement manners, and are included in the scope of the invention.

Claims

1. The city feature analysis method based on the social media image data is characterized by comprising the following steps of:

acquiring social media image data and establishing a data set;

processing the sample unbalance problem by adopting a focus loss function;

clustering the feature by using a minimum information quantity criterion;

calculating the geomorphic distance between cities;

analyzing the similarity of the features among the cities according to the feature distance, and analyzing the fine-grained features of the cities according to the clustering result;

the method for learning the physiognomy comprises the following steps: training a convolutional neural network which takes city image data as input and takes the probability value p of dividing the image into various cities as output; randomly extracting B images from the data set and inputting the B images into the convolutional neural network, wherein B is a batch size, and obtaining N through the l layer of the convolutional neural network_lSpreading a feature map, wherein the size of each feature map is W x H, and flattening each feature map into a one-dimensional vector S ═ a₁,a₂,……,a_W*H)^TWherein a is_iFor the ith feature value on the feature map, N is obtained_lVector of dimensions

using the said feature vector asAnd inputting the features into the full connection layer for learning.

2. The method for urban landscaping analysis based on social media image data as claimed in claim 1, wherein the method for indoor and outdoor image classification of the samples of the data set employs an indoor and outdoor classification model resNet18, and the method for denoising outdoor image samples is an unsupervised clustering method.

3. The method for urban landscaping analysis based on social media image data of claim 1, wherein the focus loss function is as follows:

wherein alpha is_iFor the weight of the ith city, the smaller the weight with the larger sample size, α_i＝Mun_min/Num_i，Num_minIs the minimum number of samples, Num_iIs the number of samples in the ith city, N is the total number of cities, and γ is the focal factor.

4. The social media image data-based urban geomorphology analysis method of claim 1, wherein a pre-training model is established using transfer learning, the pre-training model employing an ImageNet pre-training model.

5. The method of claim 1, wherein fine-grained features of a city are revealed by clustering feature vectors of images.

6. The method for analyzing urban landscapes based on social media image data according to any one of claims 1 or 5, wherein the clustering method comprises the following steps:

and taking the vector subjected to dimensionality reduction as the input of a Gaussian mixture model, and giving component numbers to obtain clustering results, wherein the quantity of the clustering results is the component number.

7. The method for urban landscape analysis based on social media image data according to claim 1, wherein the method for calculating the landscape distance is as follows:

after normalization processing is carried out on the confusion matrix, the similarity of the city _ i and the city _ j is T (j, i) ═ C_j,i+C_i,jIn which C is_j,i，C_i,jThe normalized values of S (j _ i) and S (i _ j) are respectively.