WO2021248335A1

WO2021248335A1 - Method and system for measuring urban poverty spaces based on street view images and machine learning

Info

Publication number: WO2021248335A1
Application number: PCT/CN2020/095204
Authority: WO
Inventors: 袁媛; 刘颖; 牛通
Original assignee: 中山大学
Priority date: 2020-06-09
Filing date: 2020-06-09
Publication date: 2021-12-16
Also published as: CN111937016B; CN111937016A

Abstract

A method and a system for measuring urban poverty spaces based on street view images and machine learning, comprising the following steps: on the basis of census data, constructing an index of multiple deprivation IMD; acquiring street view image data of a target area in a map information database; by means of image segmentation technology, segmenting the street view image data of the target area into several blocks of street view image data; on the basis of the several blocks of street view image data, incorporating a principal component analysis method to obtain a principal factor, and defining the principal factor as a street view factor; using the index of multiple deprivation IMD and the street view factor as input variables of a machine learning algorithm to obtain an urban poverty score; and, on the basis of the urban poverty score, evaluating the degree of urban poverty. Also disclosed is a system based on the present method for measuring urban internal poverty spaces based on street view images and machine learning. The present method and system promote the refinement of urban poverty research and enrich the dimensions of urban poverty measurement indices.

Description

Method and system for measuring urban internal poverty space based on street view pictures and machine learning

Technical field

The present invention relates to the field of artificial intelligence machine learning, and more specifically, to a method and system for measuring urban internal poverty space based on street view pictures and machine learning.

Background technique

Since the 1960s and 1970s, although the traditional urban poverty measurement represented by Multiple Deprivation has matured, the indicator system based on socioeconomic statistics often has a long update cycle, low availability, and data. Single source and other shortcomings. With the advent of the information age, Western academic circles have begun to use remote sensing images, night lights, bus card swiping, online rent, map points of interest and other big data to identify poverty spaces. However, the existing domestic research only uses satellite image data more frequently in rural poverty, using remote sensing images, night lights and other single types to analyze large areas or urban and rural areas. However, new data and technologies are used less in urban poverty measurement, so it is necessary to find It is applicable to the data of urban areas to expand the range of indicators of urban poverty, refine the measurement scale, and dig deeper into the spatial phenomenon of poverty.

The patent (2019102766003) discloses a method for obtaining remote sensing data of a target city through remote sensing satellites and combining POI data for poverty assessment. The above method does not use the existing urban street view images for combined evaluation, and the evaluation index dimension is less.

Summary of the invention

In order to overcome the above-mentioned shortcomings of the prior art and methods, the present invention proposes a method and system for measuring urban poverty space based on street view pictures and machine learning. The invention effectively compensates for the shortcomings of existing research, not only promotes the refinement of urban poverty research, but also enriches the dimensions of urban poverty measurement indicators. It has practical significance for improving poor communities and promoting renewal planning. It is accurate and reliable in measuring urban poverty. , Practical methods.

In order to solve the above technical problems, the technical scheme of the present invention is as follows:

A method for measuring urban poverty space based on street view pictures and machine learning, including the following steps:

Construct multiple deprivation index IMD based on census data;

Obtain the street view image data of the target area in the map information database (such as Baidu map, Gaode map, Google map, etc.);

Through image segmentation technology, the street view image data of the target area is divided into several pieces of street view image data;

Based on several pieces of street view image data, combined with the principal component analysis method, the principal factor is obtained, and the principal factor is defined as the street view factor;

The multiple deprivation index IMD and street view factor are used as input variables of the machine learning algorithm to obtain the urban poverty score;

Evaluate the poverty level of the city based on the urban poverty score.

The invention collects street view image data from a map information database, uses picture segmentation technology to fully excavate element information in the street view image data, and combines mathematical models and computer algorithms to construct a machine learning model for measuring urban poverty. The present invention effectively compensates for the shortcomings of the existing measurement, not only promotes the refinement of urban poverty research, but also enriches the dimensions of urban poverty measurement indicators. It has practical significance for improving poor communities and promoting renewal planning. It is accurate and reliable in measuring urban poverty. , Practical methods.

In a preferred solution, the "construction of multiple deprivation index IMD based on census data" includes the following sub-contents:

According to the census data, P dimensions of data are obtained, and each dimension of data corresponds to a proportional weight λ;

The multiple deprivation index IMD is expressed by the following formula:

In the formula, the E _j represents the value of the j-th latitude data.

In a preferred solution, the P=4, the four dimensions of data are income field data, education field data, employment field data, and housing field data, and the value of the income field data is E _1. The proportion of the income field data is 0.303; the value of the education field data is E ₂ , and the proportion of the education field data is 0.212; the value of the employment field data is E ₃ , and the proportion of the employment field data is 0.182; The value of the housing field data is E ₄ , and the proportion of the housing field data is 0.303; the multiple deprivation index IMD is expressed by the following formula:

IMD=E ₁ *0.303+E ₂ *0.212+E ₃ *0.182+E ₄ *0.303.

In a preferred solution, the E ₁ is expressed by the following formula:

E ₁ = the proportion of industrial workers j ₁₁ + the proportion of low-end service industries j ₁₂ + the proportion of divorces and widows j ₁₃

The said industrial worker ratio j ₁₁ is expressed by the following formula:

The proportion of industrial workers j ₁₁ = (the number of people in the mining industry + the number of people in the manufacturing industry)/total number of employees

Proportion of low-end service industry j ₁₂₁ = (population of electricity, gas and water production and supply industry + population of wholesale and retail industry + population of accommodation and catering industry + population of real estate industry) / total number of employees

The ratio of divorce and widowhood j ₁₃ is expressed by the following formula:

Divorce and widowhood ratio j ₁₃ = the number of divorced and widowed population/the sum of the unmarried population aged 15 and above and the population with a spouse.

In a preferred solution, the E ₂ is expressed by the following formula:

E ₂ = low education level j ₂₁ + the proportion of school leavers without a diploma j ₂₂

The said low education level j ₂₁ is expressed by the following formula:

Low level of education j ₂₁ = population without going to school, elementary school, and junior high school/total population

The proportion of school leavers without a diploma j ₂₂ is expressed by the following formula:

The proportion of leaving school without a diploma j ₂₂ = the population without a diploma/total population.

In a preferred solution, the E ₃ is expressed by the following formula:

E ₃ = Unemployment ratio j ₃₁ = Number of unemployed population/total population.

In a preferred solution, the E ₄ is expressed by the following formula:

E ₄ = Proportion of population living per square meter j ₄₁ + Proportion without clean energy j ₄₂ + Proportion without running water j ₄₃ + Proportion without kitchen j ₄₄ + Proportion without toilet j ₄₅ + Proportion without hot water j ₄₆

The said population ratio j ₄₁ per square meter is expressed by the following formula:

The proportion of the population living per square meter j ₄₁ = 1/per capita housing area (square meters/person)

The said non-clean energy ratio j ₄₂ is expressed by the following formula:

Proportion without clean energy j ₄₂ = number of households using coal, firewood, and other energy sources/total number of households

The ratio of no tap water j ₄₃ is expressed by the following formula:

Proportion of no running water j ₄₃ = number of households without running water/total number of households

The said no kitchen ratio j ₄₄ is expressed by the following formula:

Proportion without kitchen j ₄₄ = number of households without kitchen/total number of households

The ratio j ₄₅ without toilets is expressed by the following formula:

Proportion without toilet j ₄₅ = number of households without toilet/total number of households

The ratio of no hot water j ₄₆ is expressed by the following formula:

The proportion of no hot water j ₄₆ = the number of households without hot water/the total number of households.

In a preferred solution, the "acquiring street view image data of the target area in the map information database" includes the following sub-steps:

Obtain the road network information of the target area in the map information database;

According to the road network information of the target area, perform interval sampling according to the distance D to obtain the sampling points of the target area;

M*L pieces of image data are obtained for the sampling points of each target area, and the combined set of image data defining the sampling points of all target areas is the street view image data set of the target area. The M*L pieces of image data represent each Under the vertical viewing angle, M images of different directions are taken, and there are L vertical viewing angles.

In a preferred solution, the distance D=100 meters.

In a preferred solution, M=4, L=2; the sampling points are based on the four directions of the first vertical viewing angle and the four directions of the second vertical viewing angle. Obtain 8 pieces of image data in each direction.

In a preferred solution, the "segmentation of the street view image data of the target area into several pieces of street view image data through the image segmentation technology" includes the following sub-steps:

Sampling the street view image data set of the target area to obtain the sampling result;

In the sampling result, merge the M image data of different directions from each vertical viewing angle of each sampling point to obtain a full-area image of the corresponding sampling point at the set vertical viewing angle;

Define the set of global images of each vertical viewing angle of all sampling points as a sampling set of sampling points of the target area;

Through the existing image segmentation technology, determine the most suitable image segmentation technology for the sampling set of the sampling points of the target area, and the result obtained is defined as the best image segmentation technology for the sampling points of the target area;

Image segmentation is performed on the street view image data set of the target area corresponding to the sample points of the target area through the optimal image segmentation technology of the sampling points of the target area, and the result obtained is defined as several pieces of street view image data.

In a preferred solution, the "based on several pieces of street view image data, combined with principal component analysis to obtain the main factor, and define the main factor as the street view factor" includes the following sub-steps:

Based on several pieces of street view image data, street view indicators are obtained. The street view indicators include sky opening index P _sky , green viewing rate P _green , road ratio P _road , building ratio P _building , interface enclosure degree P _enclosure , and color elements , Salient area feature SRS, visual entropy VE, where the color elements include the name and saturation of street view image data;

The sky opening index P _sky is calculated by the following formula:

In the formula, the NS _i is the number of pixels in the sky in the i-th block of street view image data; the N _i is the total number of pixels in the i-th block of street view image data;

The green viewing rate P _green is calculated by the following formula:

In the formula, the NG _i is the number of pixels of vegetation in the i-th block of street view image data;

The ratio of the road surface P _road is calculated by the following formula:

In the formula, the NR _i is the number of pixels of the road in the i-th block of street view image data;

The stated proportion of building P _building is calculated by the following formula:

In the formula, the NB _i is the number of pixels of the building in the i-th block of street view image data;

The interface enclosure degree P _enclosure is calculated by the following formula:

P _enclosure ＝P _green +P _building

The salient area feature SRS is calculated by the following formula:

The max (R, G, B) represents the maximum value among the color components in the i-th block of street view image data; the min (R, G, B) represents the minimum value among the color components in the i-th block of street view image data value;

The visual entropy VE is calculated by the following formula:

The P _i represents the probability of the i-th block of street view image data, which is used to characterize the entropy branch value;

The street view index is used as the input variable of the principal component analysis method, and the main factor of the output variable is obtained.

In a preferred solution, the "machine learning algorithm" is a random forest algorithm.

In this preferred scheme, the random forest algorithm uses random repeated sampling and node random splitting techniques, and is based on a large number of tree-like structure ensemble learning for classification and prediction. It is a simple, stable, and highly accurate algorithm. The street view index is greatly affected by the position, location, angle of view, etc. The present invention uses a random forest algorithm that belongs to a non-linear model to realize the simulation prediction of the urban poverty score with complex and multi-dimensional street view data. Since the random forest algorithm can evaluate all variables, there is no need to worry about the problem of multiple collinearity between variables.

The invention also discloses a urban poverty space measurement system based on street view pictures and machine learning based on the above method, which includes an image acquisition module, an image segmentation module, a picture combination module, a street view index module and an urban poverty score calculation module, wherein,

The image acquisition module is used to acquire street view image data of the target area;

The picture combination module is used to combine M image data in different directions with the same vertical viewing angle of the sampling point to obtain street view image data of the target area;

The image segmentation module is used to segment the street view image data of the target area into several pieces of street view image data;

The street view index module is used to calculate the street view index of the target area;

The urban poverty score calculation module uses the multiple deprivation index IMD and the street view factor as input variables of the machine learning algorithm to obtain the urban poverty score.

In a preferred solution, the street view indicator module includes an image element pixel ratio calculation module and a color complexity calculation module, wherein:

The image element pixel ratio calculation module is used to calculate the sky open index P _sky , the green viewing rate P _green , the road surface ratio P _road , the building ratio P _building , and the interface enclosure degree P _enclosure ;

The color complexity calculation module is used to calculate the visual entropy VE.

Compared with the prior art, the beneficial effects of the technical solution of the present invention are:

Description of the drawings

Figure 1 is a flow chart of Example 1; Figure 2 is a distribution map of urban poverty levels with multiple deprivation index IMD; Figure 3 is a distribution map of sampling points for street view images; Figure 4 is a schematic diagram of the process of segmentation and interpretation of street view images; Figure 6 is the spatial distribution pattern of the sense of enclosure of streetscape buildings; Figure 7 is the spatial distribution pattern of the sense of enclosure of streetscape vegetation; Figure 8 is the spatial distribution pattern of the sense of openness of the streetscape sky; Figure 9 is the street view The spatial distribution pattern of the sense of openness of the road; Figure 10 is the spatial distribution pattern of the complex sense of streetscape color; Figure 11 is the distribution map of the urban poverty level predicted by the street view.

detailed description

The accompanying drawings are only for illustrative purposes and cannot be understood as a limitation of the patent; in order to better illustrate this embodiment, some parts of the accompanying drawings may be omitted, enlarged or reduced, and do not represent the size of the actual product;

For those skilled in the art, it is understandable that some well-known structures in the drawings and their descriptions may be omitted. The technical solution of the present invention will be further described below in conjunction with the drawings and embodiments.

Example 1

As shown in Figure 1, a method for measuring urban poverty space based on street view pictures and machine learning includes the following steps:

Construct multiple deprivation index IMD based on census data;

The multiple deprivation index IMD and the street view factor are used as input variables of the machine learning algorithm to obtain the urban poverty score.

Evaluate the poverty level of the city based on the urban poverty score.

Embodiment 1 collects street view image data from a map information database, uses image segmentation technology to fully excavate the element information in the street view image data, and combines mathematical models and computer algorithms to construct a machine learning model for measuring urban poverty. The present invention effectively compensates for the shortcomings of the existing measurement, not only promotes the refinement of urban poverty research, but also enriches the dimensions of urban poverty measurement indicators. It has practical significance for improving poor communities and promoting renewal planning. It is accurate and reliable in measuring urban poverty. , Practical methods.

In Example 1, the following extensions can also be made: "Constructing a multiple deprivation index IMD based on census data" includes the following sub-contents:

The multiple deprivation index IMD is expressed by the following formula:

In the formula, E _j represents the value of the j-th latitude data.

In Embodiment 1 and the above-mentioned improved embodiment 1, the following extensions can also be made: P=4, the four dimensions of data are income field data, education field data, employment field data, and housing field data. The value of the income field data is E ₁ , the proportion of income field data is 0.303; the value of education field data is E ₂ , the proportion of education field data is 0.212; the value of employment field data is E ₃ , the proportion of employment field data is 0.182; the value of housing field data It is E ₄ , the proportion of data in the housing sector is 0.303; the multiple deprivation index IMD is expressed by the following formula:

IMD=E ₁ *0.303+E ₂ *0.212+E ₃ *0.182+E ₄ *0.303.

In Embodiment 1 and the above-mentioned improved embodiment 1, the following extensions can also be made: E ₁ is expressed by the following formula:

The proportion of industrial workers j ₁₁ is expressed by the following formula:

In Embodiment 1 and the above-mentioned improved embodiment 1, the following extensions can also be made: E ₂ is expressed by the following formula:

The low education level j ₂₁ is expressed by the following formula:

The percentage of leaving school without a diploma j ₂₂ is expressed by the following formula:

In Embodiment 1 and the above improved embodiment 1, the following extensions can also be made: E ₃ is expressed by the following formula:

In Embodiment 1 and the above-mentioned improved embodiment 1, the following extensions can also be made: E ₄ is expressed by the following formula:

The proportion of the population living per square meter j ₄₁ is expressed by the following formula:

The non-clean energy ratio j ₄₂ is expressed by the following formula:

The ratio of no tap water j ₄₃ is expressed by the following formula:

The proportion of no kitchen j ₄₄ is expressed by the following formula:

The ratio of no toilets j ₄₅ is expressed by the following formula:

The ratio of no hot water j ₄₆ is expressed by the following formula:

In Embodiment 1 and the above-mentioned improved embodiment 1, the following extensions can also be made: "Acquiring street view image data of the target area in the map information database" includes the following sub-steps:

Obtain M*L image data for the sampling points of each target area. The combined set of image data defining the sampling points of all target areas is the street view image data set of the target area. The M*L image data represents each vertical direction. Under the viewing angle, M images of different directions are taken, and there are L vertical viewing angles.

In Embodiment 1 and the above-mentioned modified embodiment 1, the following extension can also be carried out: distance D=100 meters.

In Embodiment 1 and the above-mentioned modified embodiment 1, the following extensions can also be made: M=4, L=2; the sampling points are based on the four directions of front, back, left, and right of the first vertical viewing angle and the second vertical viewing angle Obtain 8 pieces of image data in four directions: front, back, left, and right.

In Embodiment 1 and the above-mentioned improved embodiment 1, the following extensions can also be made: "Splitting the street view image data of the target area into several pieces of street view image data through image segmentation technology" includes the following sub-steps:

In Embodiment 1 and the above-mentioned improved embodiment 1, the following extensions can also be made: "Based on several pieces of street view image data, combined with the principal component analysis method to obtain the main factor, and define the main factor as the street view factor" includes the following sub-steps:

Based on several pieces of street view image data, street view indicators are obtained. Street view indicators include sky openness index P _sky , green viewing rate P _green , road ratio P _road , building ratio P _building , interface enclosure P _enclosure , color elements, and salient areas Feature SRS, visual entropy VE, where color elements include the name and saturation of street view image data;

The sky opening index P _sky is calculated by the following formula:

In the formula, NS _i is the number of pixels in the sky in the i-th block of street view image data; N _i is the total number of pixels in the i-th block of street view image data;

The green viewing rate P _green is calculated by the following formula:

In the formula, NG _i is the number of pixels of vegetation in the i-th block of street view image data;

_{P road, the} proportion of road surface, is calculated by the following formula:

In the formula, NR _i is the number of pixels of the road in the i-th block of street view image data;

The building proportion P _building is calculated by the following formula:

In the formula, NB _i is the number of pixels of the building in the i-th block of street view image data;

The interface enclosure degree P _enclosure is calculated by the following formula: P _enclosure =P _green +P _building

The salient area feature SRS is calculated by the following formula:

max(R,G,B) represents the maximum value of the color components in the i-th block of street view image data; min(R,G,B) represents the minimum value of the color components in the i-th block of street view image data;

The visual entropy VE is calculated by the following formula:

P _i represents the probability of the i-th block of street view image data, which is used to represent the entropy branch value;

In Embodiment 1 and the above-mentioned improved embodiment 1, the following extensions can also be made: "Machine learning algorithm" is a random forest algorithm.

In this improved embodiment 1, the random forest algorithm uses random repeated sampling and node random splitting techniques, and performs classification and prediction based on a large number of tree-like structure integrated learning, which is a simple, stable, and highly accurate algorithm. The street view index is greatly affected by the position, location, angle of view, etc. The present invention uses a random forest algorithm that belongs to a non-linear model to realize the simulation prediction of the urban poverty score with complex and multi-dimensional street view data. Since the random forest algorithm can evaluate all variables, there is no need to worry about the problem of multiple collinearity between variables.

Demonstration of Example 1

Demonstration environment:

The sample selected communities located in Guangzhou City and the four central districts (Yuexiu, Liwan, Haizhu, Tianhe) as the research objects, covering various impoverished and non-poverty communities with different built environments. On the one hand, as the political, economic, and cultural center of South China, Guangzhou has always been a typical case area for urban poverty research. On the other hand, with reference to differences in administrative boundaries, divisional functions, development stages, etc., Yuexiu, Haizhu, Liwan, and Tianhe districts are suitable as research objects. According to the statistics of the sixth population census in 2010, there are 914 neighborhood committees/village committees (communities) in the four central districts, with a total population of 4.833 million, accounting for 40% of the total population in Guangzhou. The research objects are typical Representative.

Demonstration process:

Step 1: Calculate 11 indicators from the sixth national census data, construct a traditional indicator system for measuring urban poverty, and calculate the multiple deprivation index (IMD), as shown in Figure 2;

Step 2: Along the main road, sub-main road, and branch road, the street view sampling interval is determined to be a uniform distance of 100 meters, and each sampling point is from four directions of 0°, 90°, 180°, and 270°. °Horizontal viewing angle and 20° elevation angle, the acquisition time is close to the time of the sixth national census, and Baidu map street view covering 8536 sampling points and 286 communities has been obtained, a total of 61864 pictures, and their spatial distribution is shown in the figure 3 shown;

Step 3: Randomly sample half of the street view images of the case community, and use the TensorFlow deep learning network framework that is often used in the field of vision to support, and use the artificial intelligence model based on FCN, SegNet, and PSPNet for interpretation (as shown in Figure 4) ). Calculate the three efficiency evaluation indicators of pixel accuracy PA (Pixel Accuracy), average pixel accuracy MPA (Mean Pixel Accuracy), and average intersection ratio MIOU (Mean Intersection Over Union), and select the model with the highest image segmentation technology accuracy to segment all street scene pictures (As shown in Figure 5).

Step 4: Summarize the characteristics of streetscape indicators of typical poor communities, and use the method of correlation analysis to determine the streetscape elements related to the degree of urban poverty. On the basis of calculating the corresponding indicators of important street view elements, the principal component analysis method is used to reduce the dimensionality to process the multi-view and multi-element street view indicators, and rotate the factor loading matrix to extract the high-contribution and important street view factors and name them, namely Architectural enclosure, vegetation enclosure, sky openness, road openness, and color complexity are shown in Figures 6-10.

Step 5: Take the important street scene factors obtained in the previous step as independent variables, and use multiple deprivation index (IMD) as a reference variable to construct a random forest (Random Forest) prediction model. After the remaining 50% of the sample data is tested, this step is repeated to generate a large number of decision trees. When the model error tends to the smallest and stable state, the growth of the random forest is terminated, the urban poverty level is judged, and the highest frequency is finally output The classification result, as the final output result of the street view measuring the urban poverty degree, the average correct rate of the statistical model reached 82.48%. The specific result is shown in Figure 11.

In this demonstration, the urban poverty level is assigned a value from 0 to 5. The larger the number, the poorer it is. Then, it is stratified to each level of the community in proportion, and 50% of the data is drawn as the training sample. At the same time, random repeated sampling and random sampling with replacement are used to randomly select N data subsets with the same size as the existing training data to grow N independent decision tree models. After calculating the correct rate of the model prediction results and the total model error, it is found that when the number of tree nodes is 6, the average error rate of the model prediction reaches the minimum; at the same time, the number of trees is selected from 0-100, and it can be seen that 55 trees are generated After the decision tree, the total error of the model tends to be stable. Therefore, the parameters of the random forest model are determined during this demonstration. Among them, the generation of tree nodes is determined by increasing the variables one by one to compare the level of misjudgment rate, that is, selecting from M existing possible attributes, and segmenting the most representative random feature variable. This demonstration process compares 8 indicators at 0° and 20° in pairs, puts more important street view indicators into the model, allows all decision trees to grow as much as possible, and does not modify any parameters during the model building process. This helps to reduce the correlation between the decision tree used for classification and regression, enrich the comprehensiveness of the model and improve the classification ability.

After the remaining 50% of the sample data is tested, this step is repeated to generate a large number of decision trees. When the model error tends to the smallest and stable state, the growth of the random forest is terminated. The urban poverty level is judged, and the type with the highest frequency is finally output as the final output value of the random forest model, as shown in Table 1. In the process of optimizing the model, it was found that for street view indicators such as sky opening index, green viewing rate, etc., calculated based on elements, the 0° viewing angle is better, while color elements and salient area features calculated based on color are better. In other words, the 20° angle of view index contributes more to the correct prediction of the model. At the same time, due to the increase of attribute types, the model's predictive ability is improved. After the eighth street view index is added, the average correct rate of the model reaches 82.48%, which exceeds the predictive effect of the first two models. Moreover, with the addition of different attribute types, the model prediction accuracy rate rises differently. Comprehensive analysis shows that 0° sky openness index, 0° green viewing rate, 20° color element, 0° building proportion, 0° pavement proportion, and 20° visual entropy have a relatively high impact on the prediction of urban poverty.

Table 1 Evaluation parameter results of random forest model

Example 2

Embodiment 2 is an application based on embodiment 1, a system for measuring urban poverty space based on street view pictures and machine learning, including image acquisition module, image segmentation module, image combination module, street view index module, and urban poverty score calculation Modules, where

The picture combination module is used to combine M image data of different directions with the same vertical viewing angle of the sampling points to obtain street view image data of the target area;

In Embodiment 2, the following extensions can also be made: The street view index module includes a calculation module for the proportion of image elements and a color complexity calculation module, where,

In the specific content of the above-mentioned specific embodiments, the technical features can be combined in any non-contradictory manner. In order to make the description concise, all possible combinations of the above-mentioned technical features are not described. However, as long as the combination of these technical features does not exist Any contradiction should be regarded as the scope of this specification.

The same or similar reference numbers correspond to the same or similar parts;

The terms describing the positional relationship in the drawings are only used for exemplary description and cannot be understood as a limitation of the patent; for example, the calculation formula of the water flow sensor in the embodiment is not limited to the formula exemplified in the embodiment, and different types of water flow sensors The calculation formulas are different. The foregoing is the limitation of the embodiments and cannot be understood as a limitation of the patent.

Obviously, the above-mentioned embodiments of the present invention are merely examples to clearly illustrate the present invention, and are not intended to limit the implementation of the present invention. For those of ordinary skill in the art, other changes or changes in different forms can be made on the basis of the above description. It is unnecessary and impossible to list all the implementation methods here. Any modification, equivalent replacement and improvement made within the spirit and principle of the present invention shall be included in the protection scope of the claims of the present invention.

Claims

A method for measuring urban poverty space based on street view pictures and machine learning, which is characterized in that it includes the following steps:

Construct multiple deprivation index IMD based on census data;

Obtain street view image data of the target area in the map information database;

Through image segmentation technology, the street view image data of the target area is divided into several pieces of street view image data;

Based on several pieces of street view image data, combined with the principal component analysis method, the principal factor is obtained, and the principal factor is defined as the street view factor;

The multiple deprivation index IMD and the street view factor are used as input variables of the machine learning algorithm to obtain the urban poverty score.

Evaluate the poverty level of the city based on the urban poverty score.
The method for measuring urban poverty space according to claim 1, wherein the "construction of multiple deprivation index IMD based on census data" includes the following sub-contents:

According to the census data, P dimensions of data are obtained, and each dimension of data corresponds to a proportional weight λ;

The multiple deprivation index IMD is expressed by the following formula:

In the formula, the E j represents the value of the j-th latitude data.
The method for measuring urban poverty space according to claim 2, wherein said P=4, and said four dimensional data are income field data, education field data, employment field data, and housing field data, respectively. The value of the income field data is E 1 , and the proportion of the income field data is 0.303; the value of the education field data is E 2 , and the proportion of the education field data is 0.212; the value of the employment field data is E 3. The proportion of the employment field data is 0.182; the value of the housing field data is E 4 , and the proportion of the housing field data is 0.303; the multiple deprivation index IMD is expressed by the following formula:

IMD=E 1 *0.303+E 2 *0.212+E 3 *0.182+E 4 *0.303.
The method for measuring urban poverty space according to claim 3, wherein said E 1 is expressed by the following formula:

E 1 = the proportion of industrial workers j 11 + the proportion of low-end service industries j 12 + the proportion of divorces and widows j 13

The said industrial worker ratio j 11 is expressed by the following formula:

The proportion of industrial workers j 11 = (the number of people in the mining industry + the number of people in the manufacturing industry)/total number of employees

The said industrial worker ratio j 11 is expressed by the following formula:

Proportion of low-end service industry j 121 = (population of electricity, gas and water production and supply industry + population of wholesale and retail industry + population of accommodation and catering industry + population of real estate industry) / total number of employees

The ratio of divorce and widowhood j 13 is expressed by the following formula:

Divorce and widowhood ratio j 13 = the number of divorced and widowed population/the sum of the unmarried population aged 15 and above and the population with a spouse.
The method for measuring urban poverty space according to claim 3, wherein the E 2 is expressed by the following formula:

E 2 = low education level j 21 + the proportion of school leavers without a diploma j 22

The said low education level j 21 is expressed by the following formula:

Low level of education j 21 = population without going to school, elementary school, and junior high school/total population

The proportion of school leavers without a diploma j 22 is expressed by the following formula:

The proportion of leaving school without a diploma j 22 = the population without a diploma/total population.
The method for measuring urban poverty space according to claim 3, wherein the E 3 is expressed by the following formula:

E 3 = Unemployment ratio j 31 = Number of unemployed population/total population.
The method for measuring urban poverty space according to claim 3, wherein the E 4 is expressed by the following formula:

E 4 = Proportion of population living per square meter j 41 + Proportion without clean energy j 42 + Proportion without running water j 43 + Proportion without kitchen j 44 + Proportion without toilet j 45 + Proportion without hot water j 46

The said population ratio j 41 per square meter is expressed by the following formula:

The proportion of the population living per square meter j 41 = 1/per capita housing area (square meters/person)

The said non-clean energy ratio j 42 is expressed by the following formula:

Proportion without clean energy j 42 = number of households using coal, firewood, and other energy sources/total number of households

The ratio of no tap water j 43 is expressed by the following formula:

Proportion of no running water j 43 = number of households without running water/total number of households

The said no-kitchen ratio j 44 is expressed by the following formula:

Proportion without kitchen j 44 = number of households without kitchen/total number of households

The ratio j 45 without toilets is expressed by the following formula:

Proportion without toilet j 45 = number of households without toilet/total number of households

The ratio of no hot water j 46 is expressed by the following formula:

The proportion of no hot water j 46 = the number of households without hot water/total number of households.
The method for measuring urban internal poverty space according to any one of claims 1 to 7, wherein the "acquiring street view image data of the target area in a map information database" includes the following sub-steps:

Obtain the road network information of the target area in the map information database;

According to the road network information of the target area, perform interval sampling according to the distance D to obtain the sampling points of the target area;

M*L pieces of image data are obtained for the sampling points of each target area, and the combined set of image data defining the sampling points of all target areas is the street view image data set of the target area. The M*L pieces of image data represent each Under the vertical viewing angle, M images of different directions are taken, and there are L vertical viewing angles.
The method for measuring urban poverty space according to claim 8, wherein the distance D=100 meters.
The method for measuring urban poverty space according to claim 8, wherein said M=4, L=2; said sampling points are based on the four directions of front, back, left, and right of the first vertical viewing angle and the first Eight images of image data are obtained in the four directions of front, back, left, and right of the two vertical viewing angles.
The method for measuring urban poverty space according to claim 8, wherein the "segmenting the street view image data of the target area into several pieces of street view image data through image segmentation technology" comprises the following sub-steps:

Sampling the street view image data set of the target area to obtain the sampling result;

In the sampling result, merge the M image data of different directions from each vertical viewing angle of each sampling point to obtain a full-area image of the corresponding sampling point at the set vertical viewing angle;

Define the set of global images of each vertical viewing angle of all sampling points as a sampling set of sampling points of the target area;

Through the existing image segmentation technology, determine the most suitable image segmentation technology for the sampling set of the sampling points of the target area, and the result obtained is defined as the best image segmentation technology for the sampling points of the target area;

Image segmentation is performed on the street view image data set of the target area corresponding to the sample points of the target area through the optimal image segmentation technology of the sampling points of the target area, and the result obtained is defined as several pieces of street view image data.
The method for measuring urban poverty space according to claim 8, characterized in that said "based on several pieces of street view image data, combined with principal component analysis to obtain a principal factor, defining the principal factor as a street view factor" includes the following sub- step:

Based on several pieces of street view image data, street view indicators are obtained. The street view indicators include sky opening index P sky , green viewing rate P green , road surface ratio P road , building ratio P builaing , interface enclosure degree P enclosure , and color elements , Salient area feature SRS, visual entropy VE, where the color elements include the name and saturation of street view image data;

The sky opening index P sky is calculated by the following formula:

In the formula, the NS i is the number of pixels in the sky in the i-th block of street view image data; the N i is the total number of pixels in the i-th block of street view image data;

The green viewing rate P green is calculated by the following formula:

In the formula, the NG i is the number of pixels of vegetation in the i-th block of street view image data;

The ratio of the road surface P road is calculated by the following formula:

In the formula, the NR i is the number of pixels of the road in the i-th block of street view image data;

The said building proportion P building is calculated by the following formula:

In the formula, the NB i is the number of pixels of the building in the i-th block of street view image data;

The interface enclosure degree P enclosure is calculated by the following formula:

P enclosure ＝P green +P building

The salient area feature SRS is calculated by the following formula:

The max (R, G, B) represents the maximum value among the color components in the i-th block of street view image data; the min (R, G, B) represents the minimum value among the color components in the i-th block of street view image data value;

The visual entropy VE is calculated by the following formula:

The P i represents the probability of the i-th block of street view image data, which is used to characterize the entropy branch value;

The street view index is used as the input variable of the principal component analysis method, and the main factor of the output variable is obtained.
The method for measuring poverty space within a city according to claim 1, 2, 3, 4, 5, 6, 7, 9, 10, 11 or 12, wherein the "machine learning algorithm" is a random forest algorithm .
A measurement system of urban poverty space based on street view pictures and machine learning based on the method for measuring urban internal poverty space according to any one of claims 1 to 13, characterized in that it includes an image acquisition module, an image segmentation module, Picture combination module, street view index module and urban poverty score calculation module, among which,

The image acquisition module is used to acquire street view image data of the target area;

The picture combination module is used to combine M image data of different directions with the same vertical viewing angle of the sampling point to obtain street view image data of the target area;

The image segmentation module is used to segment the street view image data of the target area into several pieces of street view image data;

The street view index module is used to calculate the street view index of the target area;

The urban poverty score calculation module uses the multiple deprivation index IMD and the street view factor as input variables of the machine learning algorithm to obtain the urban poverty score.
The urban interior poverty space measurement system according to claim 14, wherein the street view indicator module includes an image element pixel ratio calculation module and a color complexity calculation module, wherein:

The image element pixel ratio calculation module is used to calculate the sky opening index P sky , the green viewing rate P green , the road surface ratio P road , the building ratio P building , and the interface enclosure degree P enclosure ;

The color complexity calculation module is used to calculate the visual entropy VE.